smartproxy/readme.problems.md

5.9 KiB

SmartProxy Performance Issues Report

Executive Summary

This report identifies performance issues and blocking operations in the SmartProxy codebase that could impact scalability and responsiveness under high load.

Critical Issues

1. Synchronous Filesystem Operations

These operations block the event loop and should be replaced with async alternatives:

Certificate Management

  • ts/proxies/http-proxy/certificate-manager.ts:29: fs.existsSync()
  • ts/proxies/http-proxy/certificate-manager.ts:30: fs.mkdirSync()
  • ts/proxies/http-proxy/certificate-manager.ts:49-50: fs.readFileSync() for loading certificates

NFTables Proxy

  • ts/proxies/nftables-proxy/nftables-proxy.ts: Multiple uses of execSync() for system commands
  • ts/proxies/nftables-proxy/nftables-proxy.ts: Multiple fs.writeFileSync() and fs.unlinkSync() operations

Certificate Store

  • ts/proxies/smart-proxy/cert-store.ts:8: ensureDirSync()
  • ts/proxies/smart-proxy/cert-store.ts:15,31,76: fileExistsSync()
  • ts/proxies/smart-proxy/cert-store.ts:77: removeManySync()

2. Event Loop Blocking Operations

Busy Wait Loop

  • ts/proxies/nftables-proxy/nftables-proxy.ts:235-238:
    const waitUntil = Date.now() + retryDelayMs;
    while (Date.now() < waitUntil) {
      // busy wait - blocks event loop completely
    }
    
    This is extremely problematic as it blocks the entire Node.js event loop.

3. Potential Memory Leaks

Timer Management Issues

Several timers are created without proper cleanup:

  • ts/proxies/http-proxy/function-cache.ts: setInterval() without storing reference for cleanup
  • ts/proxies/http-proxy/request-handler.ts: setInterval() for rate limit cleanup without cleanup
  • ts/core/utils/shared-security-manager.ts: cleanupInterval stored but no cleanup method

Event Listener Accumulation

  • Multiple instances of event listeners being added without corresponding cleanup
  • Connection handlers add listeners without always removing them on connection close

4. Connection Pool Management

ConnectionPool (ts/proxies/http-proxy/connection-pool.ts)

Good practices observed:

  • Proper connection lifecycle management
  • Periodic cleanup of idle connections
  • Connection limits enforcement

Potential issues:

  • No backpressure mechanism when pool is full
  • Synchronous sorting operation in cleanupConnectionPool() could be slow with many connections

5. Resource Management Issues

Socket Cleanup

  • Some error paths don't properly clean up sockets
  • Missing removeAllListeners() in some error scenarios could lead to memory leaks

Timeout Management

  • Inconsistent timeout handling across different components
  • Some sockets created without timeout settings

6. JSON Operations on Large Objects

  • ts/proxies/smart-proxy/cert-store.ts:21: JSON.parse() on certificate metadata
  • ts/proxies/smart-proxy/cert-store.ts:71: JSON.stringify() with pretty printing
  • ts/proxies/http-proxy/function-cache.ts:76: JSON.stringify() for cache keys (called frequently)

Recommendations

Immediate Actions (High Priority)

  1. Replace Synchronous Operations

    // Instead of:
    if (fs.existsSync(path)) { ... }
    
    // Use:
    try {
      await fs.promises.access(path);
      // file exists
    } catch {
      // file doesn't exist
    }
    
  2. Fix Busy Wait Loop

    // Instead of:
    while (Date.now() < waitUntil) { }
    
    // Use:
    await new Promise(resolve => setTimeout(resolve, retryDelayMs));
    
  3. Add Timer Cleanup

    class Component {
      private cleanupTimer?: NodeJS.Timeout;
    
      start() {
        this.cleanupTimer = setInterval(() => { ... }, 60000);
      }
    
      stop() {
        if (this.cleanupTimer) {
          clearInterval(this.cleanupTimer);
          this.cleanupTimer = undefined;
        }
      }
    }
    

Medium Priority

  1. Optimize JSON Operations

    • Cache JSON.stringify results for frequently used objects
    • Consider using faster hashing for cache keys (e.g., crypto.createHash)
    • Use streaming JSON parsers for large objects
  2. Improve Connection Pool

    • Implement backpressure/queueing when pool is full
    • Use a heap or priority queue for connection management instead of sorting
  3. Standardize Resource Cleanup

    • Create a base class for components with lifecycle management
    • Ensure all event listeners are removed on cleanup
    • Add abort controllers for better cancellation support

Long-term Improvements

  1. Worker Threads

    • Move CPU-intensive operations to worker threads
    • Consider using worker pools for NFTables operations
  2. Monitoring and Metrics

    • Add performance monitoring for event loop lag
    • Track connection pool utilization
    • Monitor memory usage patterns
  3. Graceful Degradation

    • Implement circuit breakers for backend connections
    • Add request queuing with overflow protection
    • Implement adaptive timeout strategies

Impact Assessment

These issues primarily affect:

  • Scalability: Blocking operations limit concurrent connection handling
  • Responsiveness: Event loop blocking causes latency spikes
  • Stability: Memory leaks could cause crashes under sustained load
  • Resource Usage: Inefficient resource management increases memory/CPU usage

Testing Recommendations

  1. Load test with high connection counts (10k+ concurrent)
  2. Monitor event loop lag under stress
  3. Test long-running scenarios to detect memory leaks
  4. Benchmark with async vs sync operations to measure improvement

Conclusion

While SmartProxy has good architectural design and many best practices, the identified blocking operations and resource management issues could significantly impact performance under high load. The most critical issues (busy wait loop and synchronous filesystem operations) should be addressed immediately.