Files

Juergen Kunz 9c25bf0a27 feat(smart-proxy): Improve connection/rate-limit atomicity, SNI parsing, HttpProxy & ACME orchestration, and routing utilities

2025-12-09 13:07:29 +00:00

20 KiB

Raw Blame History

SmartProxy Development Hints

Byte Tracking and Metrics

Throughput Drift Issue (Fixed)

Problem: Throughput numbers were gradually increasing over time for long-lived connections.

Root Cause: The byRoute() and byIP() methods were dividing cumulative total bytes (since connection start) by the window duration, causing rates to appear higher as connections aged:

Hour 1: 1GB total / 60s = 17 MB/s ✓
Hour 2: 2GB total / 60s = 34 MB/s ✗ (appears doubled!)
Hour 3: 3GB total / 60s = 50 MB/s ✗ (keeps rising!)

Solution: Implemented dedicated ThroughputTracker instances for each route and IP address:

Each route and IP gets its own throughput tracker with per-second sampling
Samples are taken every second and stored in a circular buffer
Rate calculations use actual samples within the requested window
Default window is now 1 second for real-time accuracy

What Gets Counted (Network Interface Throughput)

The byte tracking is designed to match network interface throughput (what Unifi/network monitoring tools show):

Counted bytes include:

All application data
TLS handshakes and protocol overhead
TLS record headers and encryption padding
HTTP headers and protocol data
WebSocket frames and protocol overhead
TLS alerts sent to clients

NOT counted:

PROXY protocol headers (sent to backend, not client)
TCP/IP headers (handled by OS, not visible at application layer)

Byte direction:

bytesReceived: All bytes received FROM the client on the incoming connection
bytesSent: All bytes sent TO the client on the incoming connection
Backend connections are separate and not mixed with client metrics

Double Counting Issue (Fixed)

Problem: Initial data chunks were being counted twice in the byte tracking:

Once when stored in pendingData in setupDirectConnection()
Again when the data flowed through bidirectional forwarding

Solution: Removed the byte counting when storing initial chunks. Bytes are now only counted when they actually flow through the setupBidirectionalForwarding() callbacks.

HttpProxy Metrics (Fixed)

Problem: HttpProxy forwarding was updating connection record byte counts but not calling metricsCollector.recordBytes(), resulting in missing throughput data.

Solution: Added metricsCollector.recordBytes() calls to the HttpProxy bidirectional forwarding callbacks.

Metrics Architecture

The metrics system has multiple layers:

Connection Records (record.bytesReceived/bytesSent): Track total bytes per connection
Global ThroughputTracker: Accumulates bytes between samples for overall rate calculations
Per-Route ThroughputTrackers: Dedicated tracker for each route with per-second sampling
Per-IP ThroughputTrackers: Dedicated tracker for each IP with per-second sampling
connectionByteTrackers: Track cumulative bytes and metadata for active connections

Key features:

All throughput trackers sample every second (1Hz)
Each tracker maintains a circular buffer of samples (default: 1 hour retention)
Rate calculations are accurate for any requested window (default: 1 second)
All byte counting happens exactly once at the data flow point
Unused route/IP trackers are automatically cleaned up when connections close

Understanding "High" Byte Counts

If byte counts seem high compared to actual application data, remember:

TLS handshakes can be 1-5KB depending on cipher suites and certificates
Each TLS record has 5 bytes of header overhead
TLS encryption adds 16-48 bytes of padding/MAC per record
HTTP/2 has additional framing overhead
WebSocket has frame headers (2-14 bytes per message)

This overhead is real network traffic and should be counted for accurate throughput metrics.

Byte Counting Paths

There are two mutually exclusive paths for connections:

Direct forwarding (route-connection-handler.ts):
- Used for TCP passthrough, TLS passthrough, and direct connections
- Bytes counted in setupBidirectionalForwarding callbacks
- Initial chunk NOT counted separately (flows through bidirectional forwarding)
HttpProxy forwarding (http-proxy-bridge.ts):
- Used for TLS termination (terminate, terminate-and-reencrypt)
- Initial chunk counted when written to proxy
- All subsequent bytes counted in setupBidirectionalForwarding callbacks
- This is the ONLY counting point for these connections

Byte Counting Audit (2025-01-06)

A comprehensive audit was performed to verify byte counting accuracy:

Audit Results:

✅ No double counting detected in any connection flow
✅ Each byte counted exactly once in each direction
✅ Connection records and metrics updated consistently
✅ PROXY protocol headers correctly excluded from client metrics
✅ NFTables forwarded connections correctly not counted (kernel handles)

Key Implementation Points:

All byte counting happens in only 2 files: route-connection-handler.ts and http-proxy-bridge.ts
Both use the same pattern: increment record.bytesReceived/Sent AND call metricsCollector.recordBytes()
Initial chunks handled correctly: stored but not counted until forwarded
TLS alerts counted as sent bytes (correct - they are sent to client)

For full audit details, see readme.byte-counting-audit.md

Connection Cleanup

Zombie Connection Detection

The connection manager performs comprehensive zombie detection every 10 seconds:

Full zombies: Both incoming and outgoing sockets destroyed but connection not cleaned up
Half zombies: One socket destroyed, grace period expired (5 minutes for TLS, 30 seconds for non-TLS)
Stuck connections: Data received but none sent back after threshold (5 minutes for TLS, 60 seconds for non-TLS)

Cleanup Queue

Connections are cleaned up through a batched queue system:

Batch size: 100 connections
Processing triggered immediately when batch size reached
Otherwise processed after 100ms delay
Prevents overwhelming the system during mass disconnections

Keep-Alive Handling

Keep-alive connections receive special treatment based on keepAliveTreatment setting:

standard: Normal timeout applies
extended: Timeout multiplied by keepAliveInactivityMultiplier (default 6x)
immortal: No timeout, connections persist indefinitely

PROXY Protocol

The system supports both receiving and sending PROXY protocol:

Receiving: Automatically detected from trusted proxy IPs (configured in proxyIPs)
Sending: Enabled per-route or globally via sendProxyProtocol setting
Real client IP is preserved and used for all connection tracking and security checks

Metrics and Throughput Calculation

The metrics system tracks throughput using per-second sampling:

Byte Recording: Bytes are recorded as data flows through connections
Sampling: Every second, accumulated bytes are stored as a sample
Rate Calculation: Throughput is calculated by summing bytes over a time window
Per-Route/IP Tracking: Separate ThroughputTracker instances for each route and IP

Key implementation details:

Bytes are recorded in the bidirectional forwarding callbacks
The instant() method returns throughput over the last 1 second
The recent() method returns throughput over the last 10 seconds
Custom windows can be specified for different averaging periods

Throughput Spikes Issue

There's a fundamental difference between application-layer and network-layer throughput:

Application Layer (what we measure):

Bytes are recorded when delivered to/from the application
Large chunks can arrive "instantly" due to kernel/Node.js buffering
Shows spikes when buffers are flushed (e.g., 20MB in 1 second = 160 Mbit/s)

Network Layer (what Unifi shows):

Actual packet flow through the network interface
Limited by physical network speed (e.g., 20 Mbit/s)
Data transfers over time, not in bursts

The spikes occur because:

Data flows over network at 20 Mbit/s (takes 8 seconds for 20MB)
Kernel/Node.js buffers this incoming data
When buffer is flushed, application receives large chunk at once
We record entire chunk in current second, creating artificial spike

Potential Solutions:

Use longer window for "instant" measurements (e.g., 5 seconds instead of 1)
Track socket write backpressure to estimate actual network flow
Implement bandwidth estimation based on connection duration
Accept that application-layer != network-layer throughput

Connection Limiting

Per-IP Connection Limits

SmartProxy tracks connections per IP address in the SecurityManager
Default limit is 100 connections per IP (configurable via maxConnectionsPerIP)
Connection rate limiting is also enforced (default 300 connections/minute per IP)
HttpProxy has been enhanced to also enforce per-IP limits when forwarding from SmartProxy

Route-Level Connection Limits

Routes can define security.maxConnections to limit connections per route
ConnectionManager tracks connections by route ID using a separate Map
Limits are enforced in RouteConnectionHandler before forwarding
Connection is tracked when route is matched: trackConnectionByRoute(routeId, connectionId)

HttpProxy Integration

When SmartProxy forwards to HttpProxy for TLS termination, it sends a CLIENT_IP:<ip>\r\n header
HttpProxy parses this header to track the real client IP, not the localhost IP
This ensures per-IP limits are enforced even for forwarded connections
The header is parsed in the connection handler before any data processing

Memory Optimization

Periodic cleanup runs every 60 seconds to remove:
- IPs with no active connections
- Expired rate limit timestamps (older than 1 minute)
Prevents memory accumulation from many unique IPs over time
Cleanup is automatic and runs in background with unref() to not keep process alive

Connection Cleanup Queue

Cleanup queue processes connections in batches to prevent overwhelming the system
Race condition prevention using isProcessingCleanup flag
Try-finally block ensures flag is always reset even if errors occur
New connections added during processing are queued for next batch

Important Implementation Notes

Always use NodeJS.Timeout type instead of NodeJS.Timer for interval/timeout references
IPv4/IPv6 normalization is handled (e.g., ::ffff:127.0.0.1 and 127.0.0.1 are treated as the same IP)
Connection limits are checked before route matching to prevent DoS attacks
SharedSecurityManager supports checking route-level limits via optional parameter

Log Deduplication

To reduce log spam during high-traffic scenarios or attacks, SmartProxy implements log deduplication for repetitive events:

How It Works

Similar log events are batched and aggregated over a 5-second window
Instead of logging each event individually, a summary is emitted
Events are grouped by type and deduplicated by key (e.g., IP address, reason)

Deduplicated Event Types

Connection Rejections (connection-rejected):
- Groups by rejection reason (global-limit, route-limit, etc.)
- Example: "Rejected 150 connections (reasons: global-limit: 100, route-limit: 50)"
IP Rejections (ip-rejected):
- Groups by IP address
- Shows top offenders with rejection counts and reasons
- Example: "Rejected 500 connections from 10 IPs (top offenders: 192.168.1.100 (200x, rate-limit), ...)"
Connection Cleanups (connection-cleanup):
- Groups by cleanup reason (normal, timeout, error, zombie, etc.)
- Example: "Cleaned up 250 connections (reasons: normal: 200, timeout: 30, error: 20)"
IP Tracking Cleanup (ip-cleanup):
- Summarizes periodic IP cleanup operations
- Example: "IP tracking cleanup: removed 50 entries across 5 cleanup cycles"

Configuration

Default flush interval: 5 seconds
Maximum batch size: 100 events (triggers immediate flush)
Global periodic flush: Every 10 seconds (ensures logs are emitted regularly)
Process exit handling: Logs are flushed on SIGINT/SIGTERM

Benefits

Reduces log volume during attacks or high traffic
Provides better overview of patterns (e.g., which IPs are attacking)
Improves log readability and analysis
Prevents log storage overflow
Maintains detailed information in aggregated form

Log Output Examples

Instead of hundreds of individual logs:

Connection rejected
Connection rejected
Connection rejected
... (repeated 500 times)

You'll see:

[SUMMARY] Rejected 500 connections from 10 IPs in 5s (rate-limit: 350, per-ip-limit: 150) (top offenders: 192.168.1.100 (200x, rate-limit), 10.0.0.1 (150x, per-ip-limit))

Instead of:

Connection terminated: ::ffff:127.0.0.1 (client_closed). Active: 266
Connection terminated: ::ffff:127.0.0.1 (client_closed). Active: 265
... (repeated 266 times)

You'll see:

[SUMMARY] 266 HttpProxy connections terminated in 5s (reasons: client_closed: 266, activeConnections: 0)

Rapid Event Handling

During attacks or high-volume scenarios, logs are flushed more frequently
If 50+ events occur within 1 second, immediate flush is triggered
Prevents memory buildup during flooding attacks
Maintains real-time visibility during incidents

Custom Certificate Provision Function

The certProvisionFunction feature has been implemented to allow users to provide their own certificate generation logic.

Implementation Details

Type Definition: The function must return Promise<TSmartProxyCertProvisionObject> where:
- TSmartProxyCertProvisionObject = plugins.tsclass.network.ICert | 'http01'
- Return 'http01' to fallback to Let's Encrypt
- Return a certificate object for custom certificates
Certificate Manager Changes:
- Added certProvisionFunction property to CertificateManager
- Modified provisionAcmeCertificate() to check custom function first
- Custom certificates are stored with source type 'custom'
- Expiry date extraction currently defaults to 90 days
Configuration Options:
- certProvisionFunction: The custom provision function
- certProvisionFallbackToAcme: Whether to fallback to ACME on error (default: true)
Usage Example:

new SmartProxy({
  certProvisionFunction: async (domain: string) => {
    if (domain === 'internal.example.com') {
      return {
        cert: customCert,
        key: customKey,
        ca: customCA
      } as unknown as TSmartProxyCertProvisionObject;
    }
    return 'http01'; // Use Let's Encrypt
  },
  certProvisionFallbackToAcme: true
})

Testing Notes:
- Type assertions through unknown are needed in tests due to strict interface typing
- Mock certificate objects work for testing but need proper type casting
- The actual certificate parsing for expiry dates would need a proper X.509 parser

Future Improvements

Implement proper certificate expiry date extraction using X.509 parsing
Add support for returning expiry date with custom certificates
Consider adding validation for custom certificate format
Add events/hooks for certificate provisioning lifecycle

HTTPS/TLS Configuration Guide

SmartProxy supports three TLS modes for handling HTTPS traffic. Understanding when to use each mode is crucial for correct configuration.

TLS Mode: Passthrough (SNI Routing)

When to use: Backend server handles its own TLS certificates.

How it works:

Client connects with TLS ClientHello containing SNI (Server Name Indication)
SmartProxy extracts the SNI hostname without decrypting
Connection is forwarded to backend as-is (still encrypted)
Backend server terminates TLS with its own certificate

Configuration:

{
  match: { ports: 443, domains: 'backend.example.com' },
  action: {
    type: 'forward',
    targets: [{ host: 'backend-server', port: 443 }],
    tls: { mode: 'passthrough' }
  }
}

Requirements:

Backend must have valid TLS certificate for the domain
Client's SNI must be present (session tickets without SNI will be rejected)
No HTTP-level inspection possible (encrypted end-to-end)

TLS Mode: Terminate

When to use: SmartProxy handles TLS, backend receives plain HTTP.

How it works:

Client connects with TLS ClientHello
SmartProxy terminates TLS (decrypts traffic)
Decrypted HTTP is forwarded to backend on plain HTTP port
Backend receives unencrypted traffic

Configuration:

{
  match: { ports: 443, domains: 'api.example.com' },
  action: {
    type: 'forward',
    targets: [{ host: 'localhost', port: 8080 }],  // HTTP backend
    tls: {
      mode: 'terminate',
      certificate: 'auto'  // Let's Encrypt, or provide { key, cert }
    }
  }
}

Requirements:

ACME email configured for auto certificates: acme: { email: 'admin@example.com' }
Port 80 available for HTTP-01 challenges (or use DNS-01)
Backend accessible on HTTP port

TLS Mode: Terminate and Re-encrypt

When to use: SmartProxy handles client TLS, but backend also requires TLS.

How it works:

Client connects with TLS ClientHello
SmartProxy terminates client TLS (decrypts)
SmartProxy creates new TLS connection to backend
Traffic is re-encrypted for the backend connection

Configuration:

{
  match: { ports: 443, domains: 'secure.example.com' },
  action: {
    type: 'forward',
    targets: [{ host: 'backend-tls', port: 443 }],  // HTTPS backend
    tls: {
      mode: 'terminate-and-reencrypt',
      certificate: 'auto'
    }
  }
}

Requirements:

Same as 'terminate' mode
Backend must have valid TLS (can be self-signed for internal use)

HttpProxy Integration

For TLS termination modes (terminate and terminate-and-reencrypt), SmartProxy uses an internal HttpProxy component:

HttpProxy listens on an internal port (default: 8443)
SmartProxy forwards TLS connections to HttpProxy for termination
Client IP is preserved via CLIENT_IP: header protocol
HTTP/2 and WebSocket are supported after TLS termination

Configuration:

{
  useHttpProxy: [443],        // Ports that use HttpProxy for TLS termination
  httpProxyPort: 8443,        // Internal HttpProxy port
  acme: {
    email: 'admin@example.com',
    useProduction: true       // false for Let's Encrypt staging
  }
}

Common Configuration Patterns

HTTP to HTTPS Redirect:

import { createHttpToHttpsRedirect } from '@push.rocks/smartproxy';

const redirectRoute = createHttpToHttpsRedirect(['example.com', 'www.example.com']);

Complete HTTPS Server (with redirect):

import { createCompleteHttpsServer } from '@push.rocks/smartproxy';

const routes = createCompleteHttpsServer(
  'example.com',
  { host: 'localhost', port: 8080 },
  { certificate: 'auto' }
);

Load Balancer with Health Checks:

import { createLoadBalancerRoute } from '@push.rocks/smartproxy';

const lbRoute = createLoadBalancerRoute(
  'api.example.com',
  [
    { host: 'backend1', port: 8080 },
    { host: 'backend2', port: 8080 },
    { host: 'backend3', port: 8080 }
  ],
  { tls: { mode: 'terminate', certificate: 'auto' } }
);

Troubleshooting

"No SNI detected" errors:

Client is using TLS session resumption without SNI
Solution: Configure route for TLS termination (allows session resumption)

"HttpProxy not available" errors:

useHttpProxy not configured for the port
Solution: Add port to useHttpProxy array in settings

Certificate provisioning failures:

Port 80 not accessible for HTTP-01 challenges
ACME email not configured
Solution: Ensure port 80 is available and acme.email is set

Connection timeouts to HttpProxy:

CLIENT_IP header parsing timeout (default: 2000ms)
Network congestion between SmartProxy and HttpProxy
Solution: Check localhost connectivity, increase timeout if needed

20 KiB Raw Blame History

SmartProxy Development Hints

Byte Tracking and Metrics

Throughput Drift Issue (Fixed)

What Gets Counted (Network Interface Throughput)

Double Counting Issue (Fixed)

HttpProxy Metrics (Fixed)

Metrics Architecture

Understanding "High" Byte Counts

Byte Counting Paths

Byte Counting Audit (2025-01-06)

Connection Cleanup

Zombie Connection Detection

Cleanup Queue

Keep-Alive Handling

PROXY Protocol

Metrics and Throughput Calculation

Throughput Spikes Issue

Connection Limiting

Per-IP Connection Limits

Route-Level Connection Limits

HttpProxy Integration

Memory Optimization

Connection Cleanup Queue

Important Implementation Notes

Log Deduplication

How It Works

Deduplicated Event Types

Configuration

Benefits

Log Output Examples

Rapid Event Handling

Custom Certificate Provision Function

Implementation Details

Future Improvements

HTTPS/TLS Configuration Guide

TLS Mode: Passthrough (SNI Routing)

TLS Mode: Terminate

TLS Mode: Terminate and Re-encrypt

HttpProxy Integration

Common Configuration Patterns

Troubleshooting

20 KiB

Raw Blame History