smartproxy/readme.hints.md

23 KiB
Raw Permalink Blame History

SmartProxy Project Hints

Project Overview

  • Package: @push.rocks/smartproxy high-performance proxy supporting HTTP(S), TCP, WebSocket, and ACME integration.
  • Written in TypeScript, compiled output in dist_ts/, uses ESM with NodeNext resolution.

Important: ACME Configuration in v19.0.0

  • Breaking Change: ACME configuration must be placed within individual route TLS settings, not at the top level
  • Route-level ACME config is the ONLY way to enable SmartAcme initialization
  • SmartCertManager requires email in route config for certificate acquisition
  • Top-level ACME configuration is ignored in v19.0.0

Repository Structure

  • ts/ TypeScript source files:
    • index.ts exports main modules.
    • plugins.ts centralizes native and third-party imports.
    • Subdirectories: networkproxy/, nftablesproxy/, port80handler/, redirect/, smartproxy/.
    • Key classes: ProxyRouter (classes.router.ts), SmartProxy (classes.smartproxy.ts), plus handlers/managers.
  • dist_ts/ transpiled .js and .d.ts files mirroring ts/ structure.
  • test/ test suites in TypeScript:
    • test.router.ts routing logic (hostname matching, wildcards, path parameters, config management).
    • test.smartproxy.ts proxy behavior tests (TCP forwarding, SNI handling, concurrency, chaining, timeouts).
    • test/helpers/ utilities (e.g., certificates).
  • assets/certs/ placeholder certificates for ACME and TLS.

Development Setup

  • Requires pnpm (v10+).
  • Install dependencies: pnpm install.
  • Build: pnpm build (runs tsbuild --web --allowimplicitany).
  • Test: pnpm test (runs tstest test/).
  • Format: pnpm format (runs gitzone format).

How to Test

Test Structure

Tests use tapbundle from @git.zone/tstest. The correct pattern is:

import { tap, expect } from '@git.zone/tstest/tapbundle';

tap.test('test description', async () => {
  // Test logic here
  expect(someValue).toEqual(expectedValue);
});

// IMPORTANT: Must end with tap.start()
tap.start();

Expect Syntax (from @push.rocks/smartexpect)

// Type assertions
expect('hello').toBeTypeofString();
expect(42).toBeTypeofNumber();

// Equality
expect('hithere').toEqual('hithere');

// Negated assertions
expect(1).not.toBeTypeofString();

// Regular expressions
expect('hithere').toMatch(/hi/);

// Numeric comparisons
expect(5).toBeGreaterThan(3);
expect(0.1 + 0.2).toBeCloseTo(0.3, 10);

// Arrays
expect([1, 2, 3]).toContain(2);
expect([1, 2, 3]).toHaveLength(3);

// Async assertions
await expect(asyncFunction()).resolves.toEqual('expected');
await expect(asyncFunction()).resolves.withTimeout(5000).toBeTypeofString();

// Complex object navigation
expect(complexObject)
  .property('users')
  .arrayItem(0)
  .property('name')
  .toEqual('Alice');

Test Modifiers

  • tap.only.test() - Run only this test
  • tap.skip.test() - Skip a test
  • tap.timeout() - Set test-specific timeout

Running Tests

  • All tests: pnpm test
  • Specific test: tsx test/test.router.ts
  • With options: tstest test/**/*.ts --verbose --timeout 60

Test File Requirements

  • Must start with test. prefix
  • Must use .ts extension
  • Must call tap.start() at the end

Coding Conventions

  • Import modules via plugins.ts:
    import * as plugins from './plugins.ts';
    const server = new plugins.http.Server();
    
  • Reference plugins with full path: plugins.acme, plugins.smartdelay, plugins.minimatch, etc.
  • Path patterns support globs (*) and parameters (:param) in ProxyRouter.
  • Wildcard hostname matching leverages minimatch patterns.

Key Components

  • ProxyRouter
    • Methods: routeReq, routeReqWithDetails.
    • Hostname matching: case-insensitive, strips port, supports exact, wildcard, TLD, complex patterns.
    • Path routing: exact, wildcard, parameter extraction (pathParams), returns pathMatch and pathRemainder.
    • Config API: setNewProxyConfigs, addProxyConfig, removeProxyConfig, getHostnames, getProxyConfigs.
  • SmartProxy
    • Manages one or more net.Server instances to forward TCP streams.
    • Options: preserveSourceIP, defaultAllowedIPs, globalPortRanges, sniEnabled.
    • DomainConfigManager: round-robin selection for multiple target IPs.
    • Graceful shutdown in stop(), ensures no lingering servers or sockets.

Notable Points

  • TSConfig: module: NodeNext, verbatimModuleSyntax, allows .js extension imports in TS.
  • Mermaid diagrams and architecture flows in readme.md illustrate component interactions and protocol flows.
  • CLI entrypoint (cli.js) supports command-line usage (ACME, proxy controls).
  • ACME and certificate handling via Port80Handler and helpers.certificates.ts.

ACME/Certificate Configuration Example (v19.0.0)

const proxy = new SmartProxy({
  routes: [{
    name: 'example.com',
    match: { domains: 'example.com', ports: 443 },
    action: {
      type: 'forward',
      target: { host: 'localhost', port: 8080 },
      tls: {
        mode: 'terminate',
        certificate: 'auto',
        acme: {  // ACME config MUST be here, not at top level
          email: 'ssl@example.com',
          useProduction: false,
          challengePort: 80
        }
      }
    }
  }]
});

TODOs / Considerations

  • Ensure import extensions in source match build outputs (.ts vs .js).
  • Update plugins.ts when adding new dependencies.
  • Maintain test coverage for new routing or proxy features.
  • Keep ts/ and dist_ts/ in sync after refactors.
  • Consider implementing top-level ACME config support for backward compatibility

HTTP-01 ACME Challenge Fix (v19.3.8)

Issue

Non-TLS connections on ports configured in useHttpProxy were not being forwarded to HttpProxy. This caused ACME HTTP-01 challenges to fail when the ACME port (usually 80) was included in useHttpProxy.

Root Cause

In the RouteConnectionHandler.handleForwardAction method, only connections with TLS settings (mode: 'terminate' or 'terminate-and-reencrypt') were being forwarded to HttpProxy. Non-TLS connections were always handled as direct connections, even when the port was configured for HttpProxy.

Solution

Added a check for non-TLS connections on ports listed in useHttpProxy:

// No TLS settings - check if this port should use HttpProxy
const isHttpProxyPort = this.settings.useHttpProxy?.includes(record.localPort);

if (isHttpProxyPort && this.httpProxyBridge.getHttpProxy()) {
  // Forward non-TLS connections to HttpProxy if configured
  this.httpProxyBridge.forwardToHttpProxy(/*...*/);
  return;
}

Test Coverage

  • test/test.http-fix-unit.ts - Unit tests verifying the fix
  • Tests confirm that non-TLS connections on HttpProxy ports are properly forwarded
  • Tests verify that non-HttpProxy ports still use direct connections

Configuration Example

const proxy = new SmartProxy({
  useHttpProxy: [80], // Enable HttpProxy for port 80
  httpProxyPort: 8443,
  acme: {
    email: 'ssl@example.com',
    port: 80
  },
  routes: [
    // Your routes here
  ]
});

ACME Certificate Provisioning Timing Fix (v19.3.9)

Issue

Certificate provisioning would start before ports were listening, causing ACME HTTP-01 challenges to fail with connection refused errors.

Root Cause

SmartProxy initialization sequence:

  1. Certificate manager initialized → immediately starts provisioning
  2. Ports start listening (too late for ACME challenges)

Solution

Deferred certificate provisioning until after ports are ready:

// SmartCertManager.initialize() now skips automatic provisioning
// SmartProxy.start() calls provisionAllCertificates() directly after ports are listening

Test Coverage

  • test/test.acme-timing-simple.ts - Verifies proper timing sequence

Migration

Update to v19.3.9+, no configuration changes needed.

Socket Handler Race Condition Fix (v19.5.0)

Issue

Initial data chunks were being emitted before async socket handlers had completed setup, causing data loss when handlers performed async operations before setting up data listeners.

Root Cause

The handleSocketHandlerAction method was using process.nextTick to emit initial chunks regardless of whether the handler was sync or async. This created a race condition where async handlers might not have their listeners ready when the initial data was emitted.

Solution

Differentiated between sync and async handlers:

const result = route.action.socketHandler(socket);

if (result instanceof Promise) {
  // Async handler - wait for completion before emitting initial data
  result.then(() => {
    if (initialChunk && initialChunk.length > 0) {
      socket.emit('data', initialChunk);
    }
  }).catch(/*...*/);
} else {
  // Sync handler - use process.nextTick as before
  if (initialChunk && initialChunk.length > 0) {
    process.nextTick(() => {
      socket.emit('data', initialChunk);
    });
  }
}

Test Coverage

  • test/test.socket-handler-race.ts - Specifically tests async handlers with delayed listener setup
  • Verifies that initial data is received even when handler sets up listeners after async work

Usage Note

Socket handlers require initial data from the client to trigger routing (not just a TLS handshake). Clients must send at least one byte of data for the handler to be invoked.

Route-Specific Security Implementation (v19.5.3)

Issue

Route-specific security configurations (ipAllowList, ipBlockList, authentication) were defined in the route types but not enforced at runtime.

Root Cause

The RouteConnectionHandler only checked global IP validation but didn't enforce route-specific security rules after matching a route.

Solution

Added security checks after route matching:

// Apply route-specific security checks
const routeSecurity = route.action.security || route.security;
if (routeSecurity) {
  // Check IP allow/block lists
  if (routeSecurity.ipAllowList || routeSecurity.ipBlockList) {
    const isIPAllowed = this.securityManager.isIPAuthorized(
      remoteIP,
      routeSecurity.ipAllowList || [],
      routeSecurity.ipBlockList || []
    );
    
    if (!isIPAllowed) {
      socket.end();
      this.connectionManager.cleanupConnection(record, 'route_ip_blocked');
      return;
    }
  }
}

Test Coverage

  • test/test.route-security-unit.ts - Unit tests verifying SecurityManager.isIPAuthorized logic
  • Tests confirm IP allow/block lists work correctly with glob patterns

Configuration Example

const routes: IRouteConfig[] = [{
  name: 'secure-api',
  match: { ports: 8443, domains: 'api.example.com' },
  action: {
    type: 'forward',
    target: { host: 'localhost', port: 3000 },
    security: {
      ipAllowList: ['192.168.1.*', '10.0.0.0/8'],    // Allow internal IPs
      ipBlockList: ['192.168.1.100'],                // But block specific IP
      maxConnections: 100,                            // Per-route limit (TODO)
      authentication: {                               // HTTP-only, requires TLS termination
        type: 'basic',
        credentials: [{ username: 'api', password: 'secret' }]
      }
    }
  }
}];

Notes

  • IP lists support glob patterns (via minimatch): 192.168.*, 10.?.?.1
  • Block lists take precedence over allow lists
  • Authentication requires TLS termination (cannot be enforced on passthrough/direct connections)
  • Per-route connection limits are not yet implemented
  • Security is defined at the route level (route.security), not in the action
  • Route matching is based solely on match criteria; security is enforced after matching

Performance Issues Investigation (v19.5.3+)

Critical Blocking Operations Found

  1. Busy Wait Loop in ts/proxies/nftables-proxy/nftables-proxy.ts:235-238

    • Blocks entire event loop with while (Date.now() < waitUntil) {}
    • Should use await new Promise(resolve => setTimeout(resolve, delay))
  2. Synchronous Filesystem Operations

    • Certificate management uses fs.existsSync(), fs.mkdirSync(), fs.readFileSync()
    • NFTables proxy uses execSync() for system commands
    • Certificate store uses ensureDirSync(), fileExistsSync(), removeManySync()
  3. Memory Leak Risks

    • Several setInterval() calls without storing references for cleanup
    • Event listeners added without proper cleanup in error paths
    • Missing removeAllListeners() calls in some connection cleanup scenarios

Performance Recommendations

  • Replace all sync filesystem operations with async alternatives
  • Fix the busy wait loop immediately (critical event loop blocker)
  • Add proper cleanup for all timers and event listeners
  • Consider worker threads for CPU-intensive operations
  • See readme.problems.md for detailed analysis and recommendations

Performance Optimizations Implemented (Phase 1 - v19.6.0)

1. Async Utilities Created (ts/core/utils/async-utils.ts)

  • delay(): Non-blocking alternative to busy wait loops
  • retryWithBackoff(): Retry operations with exponential backoff
  • withTimeout(): Execute operations with timeout protection
  • parallelLimit(): Run async operations with concurrency control
  • debounceAsync(): Debounce async functions
  • AsyncMutex: Ensure exclusive access to resources
  • CircuitBreaker: Protect against cascading failures

2. Filesystem Utilities Created (ts/core/utils/fs-utils.ts)

  • AsyncFileSystem: Complete async filesystem operations
    • exists(), ensureDir(), readFile(), writeFile()
    • readJSON(), writeJSON() with proper error handling
    • copyFile(), moveFile(), removeDir()
    • Stream creation and file listing utilities

3. Critical Fixes Applied

Busy Wait Loop Fixed

  • Location: ts/proxies/nftables-proxy/nftables-proxy.ts:235-238
  • Fix: Replaced while (Date.now() < waitUntil) {} with await delay(ms)
  • Impact: Unblocks event loop, massive performance improvement

Certificate Manager Migration

  • File: ts/proxies/http-proxy/certificate-manager.ts
  • Added async initialization method
  • Kept sync methods for backward compatibility with deprecation warnings
  • Added loadDefaultCertificatesAsync() method

Certificate Store Migration

  • File: ts/proxies/smart-proxy/cert-store.ts
  • Replaced all fileExistsSync, ensureDirSync, removeManySync
  • Used parallel operations with Promise.all() for better performance
  • Improved error handling and async JSON operations

NFTables Proxy Improvements

  • Added deprecation warnings to sync methods
  • Created executeWithTempFile() helper for common pattern
  • Started migration of sync filesystem operations to async
  • Added import for delay and AsyncFileSystem utilities

4. Backward Compatibility Maintained

  • All sync methods retained with deprecation warnings
  • Existing APIs unchanged, new async methods added alongside
  • Feature flags prepared for gradual rollout

5. Phase 1 Completion Status

Phase 1 COMPLETE - All critical performance fixes have been implemented:

  • Fixed busy wait loop in nftables-proxy.ts
  • Created async utilities (delay, retry, timeout, parallelLimit, mutex, circuit breaker)
  • Created filesystem utilities (AsyncFileSystem with full async operations)
  • Migrated all certificate management to async operations
  • Migrated nftables-proxy filesystem operations to async (except stopSync for exit handlers)
  • All tests passing for new utilities

6. Phase 2 Progress Status

🔨 Phase 2 IN PROGRESS - Resource Lifecycle Management:

  • Created LifecycleComponent base class for automatic resource cleanup
  • Created BinaryHeap data structure for priority queue operations
  • Created EnhancedConnectionPool with backpressure and health checks
  • Cleaned up legacy code (removed ts/common/, event-utils.ts, event-system.ts)
  • 📋 TODO: Migrate existing components to extend LifecycleComponent
  • 📋 TODO: Add integration tests for resource management

7. Next Steps (Remaining Work)

  • Phase 2 (cont): Migrate components to use LifecycleComponent
  • Phase 3: Add worker threads for CPU-intensive operations
  • Phase 4: Performance monitoring dashboard

Socket Error Handling Fix (v19.5.11+)

Issue

Server crashed with unhandled 'error' event when backend connections failed (ECONNREFUSED). Also caused memory leak with rising active connection count as failed connections weren't cleaned up properly.

Root Cause

  1. Race Condition: In forwarding handlers, sockets were created with net.connect() but error handlers were attached later, creating a window where errors could crash the server
  2. Incomplete Cleanup: When server connections failed, client sockets weren't properly cleaned up, leaving connection records in memory

Solution

Created createSocketWithErrorHandler() utility that attaches error handlers immediately:

// Before (race condition):
const socket = net.connect(port, host);
// ... other code ...
socket.on('error', handler); // Too late!

// After (safe):
const socket = createSocketWithErrorHandler({
  port, host,
  onError: (error) => {
    // Handle error immediately
    clientSocket.destroy();
  },
  onConnect: () => {
    // Set up forwarding
  }
});

Changes Made

  1. New Utility: ts/core/utils/socket-utils.ts - Added createSocketWithErrorHandler()
  2. Updated Handlers:
    • https-passthrough-handler.ts - Uses safe socket creation
    • https-terminate-to-http-handler.ts - Uses safe socket creation
  3. Connection Cleanup: Client sockets destroyed immediately on server connection failure

Test Coverage

  • test/test.socket-error-handling.node.ts - Verifies server doesn't crash on ECONNREFUSED
  • test/test.forwarding-error-fix.node.ts - Tests forwarding handlers handle errors gracefully

Configuration

No configuration changes needed. The fix is transparent to users.

Important Note

The fix was applied in two places:

  1. ForwardingHandler classes (https-passthrough-handler.ts, etc.) - These are standalone forwarding utilities
  2. SmartProxy route-connection-handler (route-connection-handler.ts) - This is where the actual SmartProxy connection handling happens

The critical fix for SmartProxy was in setupDirectConnection() method in route-connection-handler.ts, which now uses createSocketWithErrorHandler() to properly handle connection failures and clean up connection records.

Connection Cleanup Improvements (v19.5.12+)

Issue

Connections were still counting up during rapid retry scenarios, especially when routing failed or backend connections were refused. This was due to:

  1. Delayed Cleanup: Using initiateCleanupOnce queued cleanup operations (batch of 100 every 100ms) instead of immediate cleanup
  2. NFTables Memory Leak: NFTables connections were never cleaned up, staying in memory forever
  3. Connection Limit Bypass: When max connections reached, connection record check happened after creation

Root Cause Analysis

  1. Queued vs Immediate Cleanup:

    • initiateCleanupOnce(): Adds to cleanup queue, processes up to 100 connections every 100ms
    • cleanupConnection(): Immediate synchronous cleanup
    • Under rapid retries, connections were created faster than the queue could process them
  2. NFTables Connections:

    • Marked with usingNetworkProxy = true but never cleaned up
    • Connection records stayed in memory indefinitely
  3. Error Path Cleanup:

    • Many error paths used socket.end() (async) followed by cleanup
    • Created timing windows where connections weren't fully cleaned

Solution

  1. Immediate Cleanup: Changed all error paths from initiateCleanupOnce() to cleanupConnection() for immediate cleanup
  2. NFTables Cleanup: Added socket close listener to clean up connection records when NFTables connections close
  3. Connection Limit Fix: Added null check after createConnection() to handle rejection properly

Changes Made in route-connection-handler.ts

// 1. NFTables cleanup (line 551-553)
socket.once('close', () => {
  this.connectionManager.cleanupConnection(record, 'nftables_closed');
});

// 2. Connection limit check (line 93-96)
const record = this.connectionManager.createConnection(socket);
if (!record) {
  // Connection was rejected due to limit - socket already destroyed
  return;
}

// 3. Changed all error paths to use immediate cleanup
// Before: this.connectionManager.initiateCleanupOnce(record, reason)
// After: this.connectionManager.cleanupConnection(record, reason)

Test Coverage

  • test/test.rapid-retry-cleanup.node.ts - Verifies connection cleanup under rapid retry scenarios
  • Test shows connection count stays at 0 even with 20 rapid retries with 50ms intervals
  • Confirms both ECONNREFUSED and routing failure scenarios are handled correctly

Performance Impact

  • Positive: No more connection accumulation under load
  • Positive: Immediate cleanup reduces memory usage
  • Consideration: More frequent cleanup operations, but prevents queue backlog

Migration Notes

No configuration changes needed. The improvements are automatic and backward compatible.

Early Client Disconnect Handling (v19.5.13+)

Issue

Connections were accumulating when clients connected but disconnected before sending data or during routing. This occurred in two scenarios:

  1. TLS Path: Clients connecting and disconnecting before sending initial TLS handshake data
  2. Non-TLS Immediate Routing: Clients disconnecting while backend connection was being established

Root Cause

  1. Missing Cleanup Handlers: During initial data wait and immediate routing, no close/end handlers were attached to catch early disconnections
  2. Race Condition: Backend connection attempts continued even after client disconnected, causing unhandled errors
  3. Timing Window: Between accepting connection and establishing full bidirectional flow, disconnections weren't properly handled

Solution

  1. TLS Path Fix: Added close/end handlers during initial data wait (lines 224-253 in route-connection-handler.ts)
  2. Immediate Routing Fix: Used setupSocketHandlers for proper handler attachment (lines 180-205)
  3. Backend Error Handling: Check if connection already closed before handling backend errors (line 1144)

Changes Made

// 1. TLS path - handle disconnect before initial data
socket.once('close', () => {
  if (!initialDataReceived) {
    this.connectionManager.cleanupConnection(record, 'closed_before_data');
  }
});

// 2. Immediate routing path - proper handler setup
setupSocketHandlers(socket, (reason) => {
  if (!record.outgoing || record.outgoing.readyState !== 'open') {
    if (record.outgoing && !record.outgoing.destroyed) {
      record.outgoing.destroy(); // Abort pending backend connection
    }
    this.connectionManager.cleanupConnection(record, reason);
  }
}, undefined, 'immediate-route-client');

// 3. Backend connection error handling
onError: (error) => {
  if (record.connectionClosed) {
    logger.log('debug', 'Backend connection failed but client already disconnected');
    return; // Client already gone, nothing to clean up
  }
  // ... normal error handling
}

Test Coverage

  • test/test.connect-disconnect-cleanup.node.ts - Comprehensive test for early disconnect scenarios
  • Tests verify connection count stays at 0 even with rapid connect/disconnect patterns
  • Covers immediate disconnect, delayed disconnect, and mixed patterns

Performance Impact

  • Positive: No more connection accumulation from early disconnects
  • Positive: Immediate cleanup reduces memory usage
  • Positive: Prevents resource exhaustion from rapid reconnection attempts

Migration Notes

No configuration changes needed. The fix is automatic and backward compatible.