4.8 KiB
Production Connection Monitoring
This document explains how to use the ProductionConnectionMonitor to diagnose connection accumulation issues in real-time.
Quick Start
import ProductionConnectionMonitor from './.nogit/debug/production-connection-monitor.js';
// After starting your proxy
const monitor = new ProductionConnectionMonitor(proxy);
monitor.start(5000); // Check every 5 seconds
// The monitor will automatically capture diagnostics when:
// - Connections exceed 50 (default threshold)
// - Sudden spike of 20+ connections occurs
// - You manually call monitor.forceCaptureNow()
What Gets Captured
When accumulation is detected, the monitor saves a JSON file with:
Connection Details
- Socket states (destroyed, readable, writable, readyState)
- Connection age and activity timestamps
- Data transfer statistics (bytes sent/received)
- Target host and port information
- Keep-alive status
- Event listener counts
System State
- Memory usage
- Event loop lag
- Connection count trends
- Termination statistics
Reading Diagnostic Files
Files are saved to .nogit/connection-diagnostics/
with names like:
accumulation_2025-06-07T20-20-43-733Z_force_capture.json
Key Fields to Check
-
Socket States
"incomingState": { "destroyed": false, "readable": true, "writable": true, "readyState": "open" }
- Both destroyed = zombie connection
- One destroyed = half-zombie
- Both alive but old = potential stuck connection
-
Data Transfer
"bytesReceived": 36, "bytesSent": 0, "timeSinceLastActivity": 60000
- No bytes sent back = stuck connection
- High bytes but old = slow backend
- No activity = idle connection
-
Connection Flags
"hasReceivedInitialData": false, "hasKeepAlive": true, "connectionClosed": false
- hasReceivedInitialData=false on non-TLS = immediate routing
- hasKeepAlive=true = extended timeout applies
- connectionClosed=false = still tracked
Common Patterns
1. Hanging Backend Pattern
{
"bytesReceived": 36,
"bytesSent": 0,
"age": 120000,
"targetHost": "backend.example.com",
"incomingState": { "destroyed": false },
"outgoingState": { "destroyed": false }
}
Fix: The stuck connection detection (60s timeout) should clean these up.
2. Zombie Connection Pattern
{
"incomingState": { "destroyed": true },
"outgoingState": { "destroyed": true },
"connectionClosed": false
}
Fix: The zombie detection should clean these up within 30s.
3. Event Listener Leak Pattern
{
"incomingListeners": {
"data": 15,
"error": 20,
"close": 18
}
}
Issue: Event listeners accumulating, potential memory leak.
4. No Outgoing Socket Pattern
{
"outgoingState": { "exists": false },
"connectionClosed": false,
"age": 5000
}
Issue: Connection setup failed but cleanup didn't trigger.
Forcing Diagnostic Capture
To capture current state immediately:
monitor.forceCaptureNow();
This is useful when you notice accumulation starting.
Automated Analysis
The monitor automatically analyzes patterns and logs:
- Zombie/half-zombie counts
- Stuck connection counts
- Old connection counts
- Memory usage
- Recommendations
Integration Example
// In your proxy startup script
import { SmartProxy } from '@push.rocks/smartproxy';
import ProductionConnectionMonitor from './production-connection-monitor.js';
async function startProxyWithMonitoring() {
const proxy = new SmartProxy({
// your config
});
await proxy.start();
// Start monitoring
const monitor = new ProductionConnectionMonitor(proxy);
monitor.start(5000);
// Optional: Capture on specific events
process.on('SIGUSR1', () => {
console.log('Manual diagnostic capture triggered');
monitor.forceCaptureNow();
});
// Graceful shutdown
process.on('SIGTERM', async () => {
monitor.stop();
await proxy.stop();
process.exit(0);
});
}
Troubleshooting
Monitor Not Detecting Accumulation
- Check threshold settings (default: 50 connections)
- Reduce check interval for faster detection
- Use forceCaptureNow() to capture current state
Too Many False Positives
- Increase accumulation threshold
- Increase spike threshold
- Adjust check interval
Missing Diagnostic Data
- Ensure output directory exists and is writable
- Check disk space
- Verify process has write permissions
Next Steps
- Deploy the monitor to production
- Wait for accumulation to occur
- Share diagnostic files for analysis
- Apply targeted fixes based on patterns found
The diagnostic data will reveal the exact state of connections when accumulation occurs, enabling precise fixes for your specific scenario.