202 lines
4.8 KiB
Markdown
202 lines
4.8 KiB
Markdown
![]() |
# Production Connection Monitoring
|
||
|
|
||
|
This document explains how to use the ProductionConnectionMonitor to diagnose connection accumulation issues in real-time.
|
||
|
|
||
|
## Quick Start
|
||
|
|
||
|
```typescript
|
||
|
import ProductionConnectionMonitor from './.nogit/debug/production-connection-monitor.js';
|
||
|
|
||
|
// After starting your proxy
|
||
|
const monitor = new ProductionConnectionMonitor(proxy);
|
||
|
monitor.start(5000); // Check every 5 seconds
|
||
|
|
||
|
// The monitor will automatically capture diagnostics when:
|
||
|
// - Connections exceed 50 (default threshold)
|
||
|
// - Sudden spike of 20+ connections occurs
|
||
|
// - You manually call monitor.forceCaptureNow()
|
||
|
```
|
||
|
|
||
|
## What Gets Captured
|
||
|
|
||
|
When accumulation is detected, the monitor saves a JSON file with:
|
||
|
|
||
|
### Connection Details
|
||
|
- Socket states (destroyed, readable, writable, readyState)
|
||
|
- Connection age and activity timestamps
|
||
|
- Data transfer statistics (bytes sent/received)
|
||
|
- Target host and port information
|
||
|
- Keep-alive status
|
||
|
- Event listener counts
|
||
|
|
||
|
### System State
|
||
|
- Memory usage
|
||
|
- Event loop lag
|
||
|
- Connection count trends
|
||
|
- Termination statistics
|
||
|
|
||
|
## Reading Diagnostic Files
|
||
|
|
||
|
Files are saved to `.nogit/connection-diagnostics/` with names like:
|
||
|
```
|
||
|
accumulation_2025-06-07T20-20-43-733Z_force_capture.json
|
||
|
```
|
||
|
|
||
|
### Key Fields to Check
|
||
|
|
||
|
1. **Socket States**
|
||
|
```json
|
||
|
"incomingState": {
|
||
|
"destroyed": false,
|
||
|
"readable": true,
|
||
|
"writable": true,
|
||
|
"readyState": "open"
|
||
|
}
|
||
|
```
|
||
|
- Both destroyed = zombie connection
|
||
|
- One destroyed = half-zombie
|
||
|
- Both alive but old = potential stuck connection
|
||
|
|
||
|
2. **Data Transfer**
|
||
|
```json
|
||
|
"bytesReceived": 36,
|
||
|
"bytesSent": 0,
|
||
|
"timeSinceLastActivity": 60000
|
||
|
```
|
||
|
- No bytes sent back = stuck connection
|
||
|
- High bytes but old = slow backend
|
||
|
- No activity = idle connection
|
||
|
|
||
|
3. **Connection Flags**
|
||
|
```json
|
||
|
"hasReceivedInitialData": false,
|
||
|
"hasKeepAlive": true,
|
||
|
"connectionClosed": false
|
||
|
```
|
||
|
- hasReceivedInitialData=false on non-TLS = immediate routing
|
||
|
- hasKeepAlive=true = extended timeout applies
|
||
|
- connectionClosed=false = still tracked
|
||
|
|
||
|
## Common Patterns
|
||
|
|
||
|
### 1. Hanging Backend Pattern
|
||
|
```json
|
||
|
{
|
||
|
"bytesReceived": 36,
|
||
|
"bytesSent": 0,
|
||
|
"age": 120000,
|
||
|
"targetHost": "backend.example.com",
|
||
|
"incomingState": { "destroyed": false },
|
||
|
"outgoingState": { "destroyed": false }
|
||
|
}
|
||
|
```
|
||
|
**Fix**: The stuck connection detection (60s timeout) should clean these up.
|
||
|
|
||
|
### 2. Zombie Connection Pattern
|
||
|
```json
|
||
|
{
|
||
|
"incomingState": { "destroyed": true },
|
||
|
"outgoingState": { "destroyed": true },
|
||
|
"connectionClosed": false
|
||
|
}
|
||
|
```
|
||
|
**Fix**: The zombie detection should clean these up within 30s.
|
||
|
|
||
|
### 3. Event Listener Leak Pattern
|
||
|
```json
|
||
|
{
|
||
|
"incomingListeners": {
|
||
|
"data": 15,
|
||
|
"error": 20,
|
||
|
"close": 18
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
**Issue**: Event listeners accumulating, potential memory leak.
|
||
|
|
||
|
### 4. No Outgoing Socket Pattern
|
||
|
```json
|
||
|
{
|
||
|
"outgoingState": { "exists": false },
|
||
|
"connectionClosed": false,
|
||
|
"age": 5000
|
||
|
}
|
||
|
```
|
||
|
**Issue**: Connection setup failed but cleanup didn't trigger.
|
||
|
|
||
|
## Forcing Diagnostic Capture
|
||
|
|
||
|
To capture current state immediately:
|
||
|
```typescript
|
||
|
monitor.forceCaptureNow();
|
||
|
```
|
||
|
|
||
|
This is useful when you notice accumulation starting.
|
||
|
|
||
|
## Automated Analysis
|
||
|
|
||
|
The monitor automatically analyzes patterns and logs:
|
||
|
- Zombie/half-zombie counts
|
||
|
- Stuck connection counts
|
||
|
- Old connection counts
|
||
|
- Memory usage
|
||
|
- Recommendations
|
||
|
|
||
|
## Integration Example
|
||
|
|
||
|
```typescript
|
||
|
// In your proxy startup script
|
||
|
import { SmartProxy } from '@push.rocks/smartproxy';
|
||
|
import ProductionConnectionMonitor from './production-connection-monitor.js';
|
||
|
|
||
|
async function startProxyWithMonitoring() {
|
||
|
const proxy = new SmartProxy({
|
||
|
// your config
|
||
|
});
|
||
|
|
||
|
await proxy.start();
|
||
|
|
||
|
// Start monitoring
|
||
|
const monitor = new ProductionConnectionMonitor(proxy);
|
||
|
monitor.start(5000);
|
||
|
|
||
|
// Optional: Capture on specific events
|
||
|
process.on('SIGUSR1', () => {
|
||
|
console.log('Manual diagnostic capture triggered');
|
||
|
monitor.forceCaptureNow();
|
||
|
});
|
||
|
|
||
|
// Graceful shutdown
|
||
|
process.on('SIGTERM', async () => {
|
||
|
monitor.stop();
|
||
|
await proxy.stop();
|
||
|
process.exit(0);
|
||
|
});
|
||
|
}
|
||
|
```
|
||
|
|
||
|
## Troubleshooting
|
||
|
|
||
|
### Monitor Not Detecting Accumulation
|
||
|
- Check threshold settings (default: 50 connections)
|
||
|
- Reduce check interval for faster detection
|
||
|
- Use forceCaptureNow() to capture current state
|
||
|
|
||
|
### Too Many False Positives
|
||
|
- Increase accumulation threshold
|
||
|
- Increase spike threshold
|
||
|
- Adjust check interval
|
||
|
|
||
|
### Missing Diagnostic Data
|
||
|
- Ensure output directory exists and is writable
|
||
|
- Check disk space
|
||
|
- Verify process has write permissions
|
||
|
|
||
|
## Next Steps
|
||
|
|
||
|
1. Deploy the monitor to production
|
||
|
2. Wait for accumulation to occur
|
||
|
3. Share diagnostic files for analysis
|
||
|
4. Apply targeted fixes based on patterns found
|
||
|
|
||
|
The diagnostic data will reveal the exact state of connections when accumulation occurs, enabling precise fixes for your specific scenario.
|