13 KiB
SmartProxy Development Hints
Byte Tracking and Metrics
Throughput Drift Issue (Fixed)
Problem: Throughput numbers were gradually increasing over time for long-lived connections.
Root Cause: The byRoute()
and byIP()
methods were dividing cumulative total bytes (since connection start) by the window duration, causing rates to appear higher as connections aged:
- Hour 1: 1GB total / 60s = 17 MB/s ✓
- Hour 2: 2GB total / 60s = 34 MB/s ✗ (appears doubled!)
- Hour 3: 3GB total / 60s = 50 MB/s ✗ (keeps rising!)
Solution: Implemented dedicated ThroughputTracker instances for each route and IP address:
- Each route and IP gets its own throughput tracker with per-second sampling
- Samples are taken every second and stored in a circular buffer
- Rate calculations use actual samples within the requested window
- Default window is now 1 second for real-time accuracy
What Gets Counted (Network Interface Throughput)
The byte tracking is designed to match network interface throughput (what Unifi/network monitoring tools show):
Counted bytes include:
- All application data
- TLS handshakes and protocol overhead
- TLS record headers and encryption padding
- HTTP headers and protocol data
- WebSocket frames and protocol overhead
- TLS alerts sent to clients
NOT counted:
- PROXY protocol headers (sent to backend, not client)
- TCP/IP headers (handled by OS, not visible at application layer)
Byte direction:
bytesReceived
: All bytes received FROM the client on the incoming connectionbytesSent
: All bytes sent TO the client on the incoming connection- Backend connections are separate and not mixed with client metrics
Double Counting Issue (Fixed)
Problem: Initial data chunks were being counted twice in the byte tracking:
- Once when stored in
pendingData
insetupDirectConnection()
- Again when the data flowed through bidirectional forwarding
Solution: Removed the byte counting when storing initial chunks. Bytes are now only counted when they actually flow through the setupBidirectionalForwarding()
callbacks.
HttpProxy Metrics (Fixed)
Problem: HttpProxy forwarding was updating connection record byte counts but not calling metricsCollector.recordBytes()
, resulting in missing throughput data.
Solution: Added metricsCollector.recordBytes()
calls to the HttpProxy bidirectional forwarding callbacks.
Metrics Architecture
The metrics system has multiple layers:
- Connection Records (
record.bytesReceived/bytesSent
): Track total bytes per connection - Global ThroughputTracker: Accumulates bytes between samples for overall rate calculations
- Per-Route ThroughputTrackers: Dedicated tracker for each route with per-second sampling
- Per-IP ThroughputTrackers: Dedicated tracker for each IP with per-second sampling
- connectionByteTrackers: Track cumulative bytes and metadata for active connections
Key features:
- All throughput trackers sample every second (1Hz)
- Each tracker maintains a circular buffer of samples (default: 1 hour retention)
- Rate calculations are accurate for any requested window (default: 1 second)
- All byte counting happens exactly once at the data flow point
- Unused route/IP trackers are automatically cleaned up when connections close
Understanding "High" Byte Counts
If byte counts seem high compared to actual application data, remember:
- TLS handshakes can be 1-5KB depending on cipher suites and certificates
- Each TLS record has 5 bytes of header overhead
- TLS encryption adds 16-48 bytes of padding/MAC per record
- HTTP/2 has additional framing overhead
- WebSocket has frame headers (2-14 bytes per message)
This overhead is real network traffic and should be counted for accurate throughput metrics.
Byte Counting Paths
There are two mutually exclusive paths for connections:
-
Direct forwarding (route-connection-handler.ts):
- Used for TCP passthrough, TLS passthrough, and direct connections
- Bytes counted in
setupBidirectionalForwarding
callbacks - Initial chunk NOT counted separately (flows through bidirectional forwarding)
-
HttpProxy forwarding (http-proxy-bridge.ts):
- Used for TLS termination (terminate, terminate-and-reencrypt)
- Initial chunk counted when written to proxy
- All subsequent bytes counted in
setupBidirectionalForwarding
callbacks - This is the ONLY counting point for these connections
Byte Counting Audit (2025-01-06)
A comprehensive audit was performed to verify byte counting accuracy:
Audit Results:
- ✅ No double counting detected in any connection flow
- ✅ Each byte counted exactly once in each direction
- ✅ Connection records and metrics updated consistently
- ✅ PROXY protocol headers correctly excluded from client metrics
- ✅ NFTables forwarded connections correctly not counted (kernel handles)
Key Implementation Points:
- All byte counting happens in only 2 files:
route-connection-handler.ts
andhttp-proxy-bridge.ts
- Both use the same pattern: increment
record.bytesReceived/Sent
AND callmetricsCollector.recordBytes()
- Initial chunks handled correctly: stored but not counted until forwarded
- TLS alerts counted as sent bytes (correct - they are sent to client)
For full audit details, see readme.byte-counting-audit.md
Connection Cleanup
Zombie Connection Detection
The connection manager performs comprehensive zombie detection every 10 seconds:
- Full zombies: Both incoming and outgoing sockets destroyed but connection not cleaned up
- Half zombies: One socket destroyed, grace period expired (5 minutes for TLS, 30 seconds for non-TLS)
- Stuck connections: Data received but none sent back after threshold (5 minutes for TLS, 60 seconds for non-TLS)
Cleanup Queue
Connections are cleaned up through a batched queue system:
- Batch size: 100 connections
- Processing triggered immediately when batch size reached
- Otherwise processed after 100ms delay
- Prevents overwhelming the system during mass disconnections
Keep-Alive Handling
Keep-alive connections receive special treatment based on keepAliveTreatment
setting:
- standard: Normal timeout applies
- extended: Timeout multiplied by
keepAliveInactivityMultiplier
(default 6x) - immortal: No timeout, connections persist indefinitely
PROXY Protocol
The system supports both receiving and sending PROXY protocol:
- Receiving: Automatically detected from trusted proxy IPs (configured in
proxyIPs
) - Sending: Enabled per-route or globally via
sendProxyProtocol
setting - Real client IP is preserved and used for all connection tracking and security checks
Metrics and Throughput Calculation
The metrics system tracks throughput using per-second sampling:
- Byte Recording: Bytes are recorded as data flows through connections
- Sampling: Every second, accumulated bytes are stored as a sample
- Rate Calculation: Throughput is calculated by summing bytes over a time window
- Per-Route/IP Tracking: Separate ThroughputTracker instances for each route and IP
Key implementation details:
- Bytes are recorded in the bidirectional forwarding callbacks
- The instant() method returns throughput over the last 1 second
- The recent() method returns throughput over the last 10 seconds
- Custom windows can be specified for different averaging periods
Throughput Spikes Issue
There's a fundamental difference between application-layer and network-layer throughput:
Application Layer (what we measure):
- Bytes are recorded when delivered to/from the application
- Large chunks can arrive "instantly" due to kernel/Node.js buffering
- Shows spikes when buffers are flushed (e.g., 20MB in 1 second = 160 Mbit/s)
Network Layer (what Unifi shows):
- Actual packet flow through the network interface
- Limited by physical network speed (e.g., 20 Mbit/s)
- Data transfers over time, not in bursts
The spikes occur because:
- Data flows over network at 20 Mbit/s (takes 8 seconds for 20MB)
- Kernel/Node.js buffers this incoming data
- When buffer is flushed, application receives large chunk at once
- We record entire chunk in current second, creating artificial spike
Potential Solutions:
- Use longer window for "instant" measurements (e.g., 5 seconds instead of 1)
- Track socket write backpressure to estimate actual network flow
- Implement bandwidth estimation based on connection duration
- Accept that application-layer != network-layer throughput
Connection Limiting
Per-IP Connection Limits
- SmartProxy tracks connections per IP address in the SecurityManager
- Default limit is 100 connections per IP (configurable via
maxConnectionsPerIP
) - Connection rate limiting is also enforced (default 300 connections/minute per IP)
- HttpProxy has been enhanced to also enforce per-IP limits when forwarding from SmartProxy
Route-Level Connection Limits
- Routes can define
security.maxConnections
to limit connections per route - ConnectionManager tracks connections by route ID using a separate Map
- Limits are enforced in RouteConnectionHandler before forwarding
- Connection is tracked when route is matched:
trackConnectionByRoute(routeId, connectionId)
HttpProxy Integration
- When SmartProxy forwards to HttpProxy for TLS termination, it sends a
CLIENT_IP:<ip>\r\n
header - HttpProxy parses this header to track the real client IP, not the localhost IP
- This ensures per-IP limits are enforced even for forwarded connections
- The header is parsed in the connection handler before any data processing
Memory Optimization
- Periodic cleanup runs every 60 seconds to remove:
- IPs with no active connections
- Expired rate limit timestamps (older than 1 minute)
- Prevents memory accumulation from many unique IPs over time
- Cleanup is automatic and runs in background with
unref()
to not keep process alive
Connection Cleanup Queue
- Cleanup queue processes connections in batches to prevent overwhelming the system
- Race condition prevention using
isProcessingCleanup
flag - Try-finally block ensures flag is always reset even if errors occur
- New connections added during processing are queued for next batch
Important Implementation Notes
- Always use
NodeJS.Timeout
type instead ofNodeJS.Timer
for interval/timeout references - IPv4/IPv6 normalization is handled (e.g.,
::ffff:127.0.0.1
and127.0.0.1
are treated as the same IP) - Connection limits are checked before route matching to prevent DoS attacks
- SharedSecurityManager supports checking route-level limits via optional parameter
Log Deduplication
To reduce log spam during high-traffic scenarios or attacks, SmartProxy implements log deduplication for repetitive events:
How It Works
- Similar log events are batched and aggregated over a 5-second window
- Instead of logging each event individually, a summary is emitted
- Events are grouped by type and deduplicated by key (e.g., IP address, reason)
Deduplicated Event Types
-
Connection Rejections (
connection-rejected
):- Groups by rejection reason (global-limit, route-limit, etc.)
- Example: "Rejected 150 connections (reasons: global-limit: 100, route-limit: 50)"
-
IP Rejections (
ip-rejected
):- Groups by IP address
- Shows top offenders with rejection counts and reasons
- Example: "Rejected 500 connections from 10 IPs (top offenders: 192.168.1.100 (200x, rate-limit), ...)"
-
Connection Cleanups (
connection-cleanup
):- Groups by cleanup reason (normal, timeout, error, zombie, etc.)
- Example: "Cleaned up 250 connections (reasons: normal: 200, timeout: 30, error: 20)"
-
IP Tracking Cleanup (
ip-cleanup
):- Summarizes periodic IP cleanup operations
- Example: "IP tracking cleanup: removed 50 entries across 5 cleanup cycles"
Configuration
- Default flush interval: 5 seconds
- Maximum batch size: 100 events (triggers immediate flush)
- Global periodic flush: Every 10 seconds (ensures logs are emitted regularly)
- Process exit handling: Logs are flushed on SIGINT/SIGTERM
Benefits
- Reduces log volume during attacks or high traffic
- Provides better overview of patterns (e.g., which IPs are attacking)
- Improves log readability and analysis
- Prevents log storage overflow
- Maintains detailed information in aggregated form
Log Output Examples
Instead of hundreds of individual logs:
Connection rejected
Connection rejected
Connection rejected
... (repeated 500 times)
You'll see:
[SUMMARY] Rejected 500 connections from 10 IPs in 5s (top offenders: 192.168.1.100 (200x, rate-limit), 10.0.0.1 (150x, per-ip-limit))
Instead of:
Connection terminated: ::ffff:127.0.0.1 (client_closed). Active: 266
Connection terminated: ::ffff:127.0.0.1 (client_closed). Active: 265
... (repeated 266 times)
You'll see:
[SUMMARY] 266 HttpProxy connections terminated in 5s (reasons: client_closed: 266, activeConnections: 0)
Rapid Event Handling
- During attacks or high-volume scenarios, logs are flushed more frequently
- If 50+ events occur within 1 second, immediate flush is triggered
- Prevents memory buildup during flooding attacks
- Maintains real-time visibility during incidents