Files
smartproxy/readme.hints.md

20 KiB

SmartProxy Development Hints

Byte Tracking and Metrics

Throughput Drift Issue (Fixed)

Problem: Throughput numbers were gradually increasing over time for long-lived connections.

Root Cause: The byRoute() and byIP() methods were dividing cumulative total bytes (since connection start) by the window duration, causing rates to appear higher as connections aged:

  • Hour 1: 1GB total / 60s = 17 MB/s ✓
  • Hour 2: 2GB total / 60s = 34 MB/s ✗ (appears doubled!)
  • Hour 3: 3GB total / 60s = 50 MB/s ✗ (keeps rising!)

Solution: Implemented dedicated ThroughputTracker instances for each route and IP address:

  • Each route and IP gets its own throughput tracker with per-second sampling
  • Samples are taken every second and stored in a circular buffer
  • Rate calculations use actual samples within the requested window
  • Default window is now 1 second for real-time accuracy

What Gets Counted (Network Interface Throughput)

The byte tracking is designed to match network interface throughput (what Unifi/network monitoring tools show):

Counted bytes include:

  • All application data
  • TLS handshakes and protocol overhead
  • TLS record headers and encryption padding
  • HTTP headers and protocol data
  • WebSocket frames and protocol overhead
  • TLS alerts sent to clients

NOT counted:

  • PROXY protocol headers (sent to backend, not client)
  • TCP/IP headers (handled by OS, not visible at application layer)

Byte direction:

  • bytesReceived: All bytes received FROM the client on the incoming connection
  • bytesSent: All bytes sent TO the client on the incoming connection
  • Backend connections are separate and not mixed with client metrics

Double Counting Issue (Fixed)

Problem: Initial data chunks were being counted twice in the byte tracking:

  1. Once when stored in pendingData in setupDirectConnection()
  2. Again when the data flowed through bidirectional forwarding

Solution: Removed the byte counting when storing initial chunks. Bytes are now only counted when they actually flow through the setupBidirectionalForwarding() callbacks.

HttpProxy Metrics (Fixed)

Problem: HttpProxy forwarding was updating connection record byte counts but not calling metricsCollector.recordBytes(), resulting in missing throughput data.

Solution: Added metricsCollector.recordBytes() calls to the HttpProxy bidirectional forwarding callbacks.

Metrics Architecture

The metrics system has multiple layers:

  1. Connection Records (record.bytesReceived/bytesSent): Track total bytes per connection
  2. Global ThroughputTracker: Accumulates bytes between samples for overall rate calculations
  3. Per-Route ThroughputTrackers: Dedicated tracker for each route with per-second sampling
  4. Per-IP ThroughputTrackers: Dedicated tracker for each IP with per-second sampling
  5. connectionByteTrackers: Track cumulative bytes and metadata for active connections

Key features:

  • All throughput trackers sample every second (1Hz)
  • Each tracker maintains a circular buffer of samples (default: 1 hour retention)
  • Rate calculations are accurate for any requested window (default: 1 second)
  • All byte counting happens exactly once at the data flow point
  • Unused route/IP trackers are automatically cleaned up when connections close

Understanding "High" Byte Counts

If byte counts seem high compared to actual application data, remember:

  • TLS handshakes can be 1-5KB depending on cipher suites and certificates
  • Each TLS record has 5 bytes of header overhead
  • TLS encryption adds 16-48 bytes of padding/MAC per record
  • HTTP/2 has additional framing overhead
  • WebSocket has frame headers (2-14 bytes per message)

This overhead is real network traffic and should be counted for accurate throughput metrics.

Byte Counting Paths

There are two mutually exclusive paths for connections:

  1. Direct forwarding (route-connection-handler.ts):

    • Used for TCP passthrough, TLS passthrough, and direct connections
    • Bytes counted in setupBidirectionalForwarding callbacks
    • Initial chunk NOT counted separately (flows through bidirectional forwarding)
  2. HttpProxy forwarding (http-proxy-bridge.ts):

    • Used for TLS termination (terminate, terminate-and-reencrypt)
    • Initial chunk counted when written to proxy
    • All subsequent bytes counted in setupBidirectionalForwarding callbacks
    • This is the ONLY counting point for these connections

Byte Counting Audit (2025-01-06)

A comprehensive audit was performed to verify byte counting accuracy:

Audit Results:

  • No double counting detected in any connection flow
  • Each byte counted exactly once in each direction
  • Connection records and metrics updated consistently
  • PROXY protocol headers correctly excluded from client metrics
  • NFTables forwarded connections correctly not counted (kernel handles)

Key Implementation Points:

  • All byte counting happens in only 2 files: route-connection-handler.ts and http-proxy-bridge.ts
  • Both use the same pattern: increment record.bytesReceived/Sent AND call metricsCollector.recordBytes()
  • Initial chunks handled correctly: stored but not counted until forwarded
  • TLS alerts counted as sent bytes (correct - they are sent to client)

For full audit details, see readme.byte-counting-audit.md

Connection Cleanup

Zombie Connection Detection

The connection manager performs comprehensive zombie detection every 10 seconds:

  • Full zombies: Both incoming and outgoing sockets destroyed but connection not cleaned up
  • Half zombies: One socket destroyed, grace period expired (5 minutes for TLS, 30 seconds for non-TLS)
  • Stuck connections: Data received but none sent back after threshold (5 minutes for TLS, 60 seconds for non-TLS)

Cleanup Queue

Connections are cleaned up through a batched queue system:

  • Batch size: 100 connections
  • Processing triggered immediately when batch size reached
  • Otherwise processed after 100ms delay
  • Prevents overwhelming the system during mass disconnections

Keep-Alive Handling

Keep-alive connections receive special treatment based on keepAliveTreatment setting:

  • standard: Normal timeout applies
  • extended: Timeout multiplied by keepAliveInactivityMultiplier (default 6x)
  • immortal: No timeout, connections persist indefinitely

PROXY Protocol

The system supports both receiving and sending PROXY protocol:

  • Receiving: Automatically detected from trusted proxy IPs (configured in proxyIPs)
  • Sending: Enabled per-route or globally via sendProxyProtocol setting
  • Real client IP is preserved and used for all connection tracking and security checks

Metrics and Throughput Calculation

The metrics system tracks throughput using per-second sampling:

  1. Byte Recording: Bytes are recorded as data flows through connections
  2. Sampling: Every second, accumulated bytes are stored as a sample
  3. Rate Calculation: Throughput is calculated by summing bytes over a time window
  4. Per-Route/IP Tracking: Separate ThroughputTracker instances for each route and IP

Key implementation details:

  • Bytes are recorded in the bidirectional forwarding callbacks
  • The instant() method returns throughput over the last 1 second
  • The recent() method returns throughput over the last 10 seconds
  • Custom windows can be specified for different averaging periods

Throughput Spikes Issue

There's a fundamental difference between application-layer and network-layer throughput:

Application Layer (what we measure):

  • Bytes are recorded when delivered to/from the application
  • Large chunks can arrive "instantly" due to kernel/Node.js buffering
  • Shows spikes when buffers are flushed (e.g., 20MB in 1 second = 160 Mbit/s)

Network Layer (what Unifi shows):

  • Actual packet flow through the network interface
  • Limited by physical network speed (e.g., 20 Mbit/s)
  • Data transfers over time, not in bursts

The spikes occur because:

  1. Data flows over network at 20 Mbit/s (takes 8 seconds for 20MB)
  2. Kernel/Node.js buffers this incoming data
  3. When buffer is flushed, application receives large chunk at once
  4. We record entire chunk in current second, creating artificial spike

Potential Solutions:

  1. Use longer window for "instant" measurements (e.g., 5 seconds instead of 1)
  2. Track socket write backpressure to estimate actual network flow
  3. Implement bandwidth estimation based on connection duration
  4. Accept that application-layer != network-layer throughput

Connection Limiting

Per-IP Connection Limits

  • SmartProxy tracks connections per IP address in the SecurityManager
  • Default limit is 100 connections per IP (configurable via maxConnectionsPerIP)
  • Connection rate limiting is also enforced (default 300 connections/minute per IP)
  • HttpProxy has been enhanced to also enforce per-IP limits when forwarding from SmartProxy

Route-Level Connection Limits

  • Routes can define security.maxConnections to limit connections per route
  • ConnectionManager tracks connections by route ID using a separate Map
  • Limits are enforced in RouteConnectionHandler before forwarding
  • Connection is tracked when route is matched: trackConnectionByRoute(routeId, connectionId)

HttpProxy Integration

  • When SmartProxy forwards to HttpProxy for TLS termination, it sends a CLIENT_IP:<ip>\r\n header
  • HttpProxy parses this header to track the real client IP, not the localhost IP
  • This ensures per-IP limits are enforced even for forwarded connections
  • The header is parsed in the connection handler before any data processing

Memory Optimization

  • Periodic cleanup runs every 60 seconds to remove:
    • IPs with no active connections
    • Expired rate limit timestamps (older than 1 minute)
  • Prevents memory accumulation from many unique IPs over time
  • Cleanup is automatic and runs in background with unref() to not keep process alive

Connection Cleanup Queue

  • Cleanup queue processes connections in batches to prevent overwhelming the system
  • Race condition prevention using isProcessingCleanup flag
  • Try-finally block ensures flag is always reset even if errors occur
  • New connections added during processing are queued for next batch

Important Implementation Notes

  • Always use NodeJS.Timeout type instead of NodeJS.Timer for interval/timeout references
  • IPv4/IPv6 normalization is handled (e.g., ::ffff:127.0.0.1 and 127.0.0.1 are treated as the same IP)
  • Connection limits are checked before route matching to prevent DoS attacks
  • SharedSecurityManager supports checking route-level limits via optional parameter

Log Deduplication

To reduce log spam during high-traffic scenarios or attacks, SmartProxy implements log deduplication for repetitive events:

How It Works

  • Similar log events are batched and aggregated over a 5-second window
  • Instead of logging each event individually, a summary is emitted
  • Events are grouped by type and deduplicated by key (e.g., IP address, reason)

Deduplicated Event Types

  1. Connection Rejections (connection-rejected):

    • Groups by rejection reason (global-limit, route-limit, etc.)
    • Example: "Rejected 150 connections (reasons: global-limit: 100, route-limit: 50)"
  2. IP Rejections (ip-rejected):

    • Groups by IP address
    • Shows top offenders with rejection counts and reasons
    • Example: "Rejected 500 connections from 10 IPs (top offenders: 192.168.1.100 (200x, rate-limit), ...)"
  3. Connection Cleanups (connection-cleanup):

    • Groups by cleanup reason (normal, timeout, error, zombie, etc.)
    • Example: "Cleaned up 250 connections (reasons: normal: 200, timeout: 30, error: 20)"
  4. IP Tracking Cleanup (ip-cleanup):

    • Summarizes periodic IP cleanup operations
    • Example: "IP tracking cleanup: removed 50 entries across 5 cleanup cycles"

Configuration

  • Default flush interval: 5 seconds
  • Maximum batch size: 100 events (triggers immediate flush)
  • Global periodic flush: Every 10 seconds (ensures logs are emitted regularly)
  • Process exit handling: Logs are flushed on SIGINT/SIGTERM

Benefits

  • Reduces log volume during attacks or high traffic
  • Provides better overview of patterns (e.g., which IPs are attacking)
  • Improves log readability and analysis
  • Prevents log storage overflow
  • Maintains detailed information in aggregated form

Log Output Examples

Instead of hundreds of individual logs:

Connection rejected
Connection rejected
Connection rejected
... (repeated 500 times)

You'll see:

[SUMMARY] Rejected 500 connections from 10 IPs in 5s (rate-limit: 350, per-ip-limit: 150) (top offenders: 192.168.1.100 (200x, rate-limit), 10.0.0.1 (150x, per-ip-limit))

Instead of:

Connection terminated: ::ffff:127.0.0.1 (client_closed). Active: 266
Connection terminated: ::ffff:127.0.0.1 (client_closed). Active: 265
... (repeated 266 times)

You'll see:

[SUMMARY] 266 HttpProxy connections terminated in 5s (reasons: client_closed: 266, activeConnections: 0)

Rapid Event Handling

  • During attacks or high-volume scenarios, logs are flushed more frequently
  • If 50+ events occur within 1 second, immediate flush is triggered
  • Prevents memory buildup during flooding attacks
  • Maintains real-time visibility during incidents

Custom Certificate Provision Function

The certProvisionFunction feature has been implemented to allow users to provide their own certificate generation logic.

Implementation Details

  1. Type Definition: The function must return Promise<TSmartProxyCertProvisionObject> where:

    • TSmartProxyCertProvisionObject = plugins.tsclass.network.ICert | 'http01'
    • Return 'http01' to fallback to Let's Encrypt
    • Return a certificate object for custom certificates
  2. Certificate Manager Changes:

    • Added certProvisionFunction property to CertificateManager
    • Modified provisionAcmeCertificate() to check custom function first
    • Custom certificates are stored with source type 'custom'
    • Expiry date extraction currently defaults to 90 days
  3. Configuration Options:

    • certProvisionFunction: The custom provision function
    • certProvisionFallbackToAcme: Whether to fallback to ACME on error (default: true)
  4. Usage Example:

new SmartProxy({
  certProvisionFunction: async (domain: string) => {
    if (domain === 'internal.example.com') {
      return {
        cert: customCert,
        key: customKey,
        ca: customCA
      } as unknown as TSmartProxyCertProvisionObject;
    }
    return 'http01'; // Use Let's Encrypt
  },
  certProvisionFallbackToAcme: true
})
  1. Testing Notes:
    • Type assertions through unknown are needed in tests due to strict interface typing
    • Mock certificate objects work for testing but need proper type casting
    • The actual certificate parsing for expiry dates would need a proper X.509 parser

Future Improvements

  1. Implement proper certificate expiry date extraction using X.509 parsing
  2. Add support for returning expiry date with custom certificates
  3. Consider adding validation for custom certificate format
  4. Add events/hooks for certificate provisioning lifecycle

HTTPS/TLS Configuration Guide

SmartProxy supports three TLS modes for handling HTTPS traffic. Understanding when to use each mode is crucial for correct configuration.

TLS Mode: Passthrough (SNI Routing)

When to use: Backend server handles its own TLS certificates.

How it works:

  1. Client connects with TLS ClientHello containing SNI (Server Name Indication)
  2. SmartProxy extracts the SNI hostname without decrypting
  3. Connection is forwarded to backend as-is (still encrypted)
  4. Backend server terminates TLS with its own certificate

Configuration:

{
  match: { ports: 443, domains: 'backend.example.com' },
  action: {
    type: 'forward',
    targets: [{ host: 'backend-server', port: 443 }],
    tls: { mode: 'passthrough' }
  }
}

Requirements:

  • Backend must have valid TLS certificate for the domain
  • Client's SNI must be present (session tickets without SNI will be rejected)
  • No HTTP-level inspection possible (encrypted end-to-end)

TLS Mode: Terminate

When to use: SmartProxy handles TLS, backend receives plain HTTP.

How it works:

  1. Client connects with TLS ClientHello
  2. SmartProxy terminates TLS (decrypts traffic)
  3. Decrypted HTTP is forwarded to backend on plain HTTP port
  4. Backend receives unencrypted traffic

Configuration:

{
  match: { ports: 443, domains: 'api.example.com' },
  action: {
    type: 'forward',
    targets: [{ host: 'localhost', port: 8080 }],  // HTTP backend
    tls: {
      mode: 'terminate',
      certificate: 'auto'  // Let's Encrypt, or provide { key, cert }
    }
  }
}

Requirements:

  • ACME email configured for auto certificates: acme: { email: 'admin@example.com' }
  • Port 80 available for HTTP-01 challenges (or use DNS-01)
  • Backend accessible on HTTP port

TLS Mode: Terminate and Re-encrypt

When to use: SmartProxy handles client TLS, but backend also requires TLS.

How it works:

  1. Client connects with TLS ClientHello
  2. SmartProxy terminates client TLS (decrypts)
  3. SmartProxy creates new TLS connection to backend
  4. Traffic is re-encrypted for the backend connection

Configuration:

{
  match: { ports: 443, domains: 'secure.example.com' },
  action: {
    type: 'forward',
    targets: [{ host: 'backend-tls', port: 443 }],  // HTTPS backend
    tls: {
      mode: 'terminate-and-reencrypt',
      certificate: 'auto'
    }
  }
}

Requirements:

  • Same as 'terminate' mode
  • Backend must have valid TLS (can be self-signed for internal use)

HttpProxy Integration

For TLS termination modes (terminate and terminate-and-reencrypt), SmartProxy uses an internal HttpProxy component:

  • HttpProxy listens on an internal port (default: 8443)
  • SmartProxy forwards TLS connections to HttpProxy for termination
  • Client IP is preserved via CLIENT_IP: header protocol
  • HTTP/2 and WebSocket are supported after TLS termination

Configuration:

{
  useHttpProxy: [443],        // Ports that use HttpProxy for TLS termination
  httpProxyPort: 8443,        // Internal HttpProxy port
  acme: {
    email: 'admin@example.com',
    useProduction: true       // false for Let's Encrypt staging
  }
}

Common Configuration Patterns

HTTP to HTTPS Redirect:

import { createHttpToHttpsRedirect } from '@push.rocks/smartproxy';

const redirectRoute = createHttpToHttpsRedirect(['example.com', 'www.example.com']);

Complete HTTPS Server (with redirect):

import { createCompleteHttpsServer } from '@push.rocks/smartproxy';

const routes = createCompleteHttpsServer(
  'example.com',
  { host: 'localhost', port: 8080 },
  { certificate: 'auto' }
);

Load Balancer with Health Checks:

import { createLoadBalancerRoute } from '@push.rocks/smartproxy';

const lbRoute = createLoadBalancerRoute(
  'api.example.com',
  [
    { host: 'backend1', port: 8080 },
    { host: 'backend2', port: 8080 },
    { host: 'backend3', port: 8080 }
  ],
  { tls: { mode: 'terminate', certificate: 'auto' } }
);

Troubleshooting

"No SNI detected" errors:

  • Client is using TLS session resumption without SNI
  • Solution: Configure route for TLS termination (allows session resumption)

"HttpProxy not available" errors:

  • useHttpProxy not configured for the port
  • Solution: Add port to useHttpProxy array in settings

Certificate provisioning failures:

  • Port 80 not accessible for HTTP-01 challenges
  • ACME email not configured
  • Solution: Ensure port 80 is available and acme.email is set

Connection timeouts to HttpProxy:

  • CLIENT_IP header parsing timeout (default: 2000ms)
  • Network congestion between SmartProxy and HttpProxy
  • Solution: Check localhost connectivity, increase timeout if needed