smartproxy/readme.plan.md

5.5 KiB

SmartProxy Development Plan

cat /home/philkunz/.claude/CLAUDE.md

Critical Bug Fix: Port 80 EADDRINUSE with ACME Challenge Routes

Problem Statement

SmartProxy encounters an "EADDRINUSE" error on port 80 when provisioning multiple ACME certificates. The issue occurs because the certificate manager adds and removes the challenge route for each certificate individually, causing race conditions when multiple certificates are provisioned concurrently.

Root Cause

The SmartCertManager class adds the ACME challenge route (port 80) before provisioning each certificate and removes it afterward. When multiple certificates are provisioned:

  1. Each provisioning cycle adds its own challenge route
  2. This triggers updateRoutes() which calls PortManager.updatePorts()
  3. Port 80 is repeatedly added/removed, causing binding conflicts

Implementation Plan

Phase 1: Refactor Challenge Route Lifecycle

  1. Modify challenge route handling in SmartCertManager
    • Add challenge route once during initialization if ACME is configured
    • Keep challenge route active throughout entire certificate provisioning
    • Remove challenge route only after all certificates are provisioned
    • Add concurrency control to prevent multiple simultaneous route updates

Phase 2: Update Certificate Provisioning Flow

  1. Refactor certificate provisioning methods
    • Separate challenge route management from individual certificate provisioning
    • Update provisionAcmeCertificate() to not add/remove challenge routes
    • Modify provisionAllCertificates() to handle challenge route lifecycle
    • Add error handling for challenge route initialization failures

Phase 3: Implement Concurrency Controls

  1. Add synchronization mechanisms
    • Implement mutex/lock for challenge route operations
    • Ensure certificate provisioning is properly serialized
    • Add safeguards against duplicate challenge routes
    • Handle edge cases (shutdown during provisioning, renewal conflicts)

Phase 4: Enhance Error Handling

  1. Improve error handling and recovery
    • Add specific error types for port conflicts
    • Implement retry logic for transient port binding issues
    • Add detailed logging for challenge route lifecycle
    • Ensure proper cleanup on errors

Phase 5: Create Comprehensive Tests

  1. Write tests for challenge route management
    • Test concurrent certificate provisioning
    • Test challenge route persistence during provisioning
    • Test error scenarios (port already in use)
    • Test cleanup after provisioning
    • Test renewal scenarios with existing challenge routes

Phase 6: Update Documentation

  1. Document the new behavior
    • Update certificate management documentation
    • Add troubleshooting guide for port conflicts
    • Document the challenge route lifecycle
    • Include examples of proper ACME configuration

Technical Details

Specific Code Changes

  1. In SmartCertManager.initialize():

    // Add challenge route once at initialization
    if (hasAcmeRoutes && this.acmeOptions?.email) {
      await this.addChallengeRoute();
    }
    
  2. Modify provisionAcmeCertificate():

    // Remove these lines:
    // await this.addChallengeRoute();
    // await this.removeChallengeRoute();
    
  3. Update stop() method:

    // Always remove challenge route on shutdown
    if (this.challengeRoute) {
      await this.removeChallengeRoute();
    }
    
  4. Add concurrency control:

    private challengeRouteLock = new AsyncLock();
    
    private async manageChallengeRoute(operation: 'add' | 'remove'): Promise<void> {
      await this.challengeRouteLock.acquire('challenge-route', async () => {
        if (operation === 'add') {
          await this.addChallengeRoute();
        } else {
          await this.removeChallengeRoute();
        }
      });
    }
    

Success Criteria

  • No EADDRINUSE errors when provisioning multiple certificates
  • Challenge route remains active during entire provisioning cycle
  • Port 80 is only bound once per SmartProxy instance
  • Proper cleanup on shutdown or error
  • All tests pass
  • Documentation clearly explains the behavior

Implementation Summary

The port 80 EADDRINUSE issue has been successfully fixed through the following changes:

  1. Challenge Route Lifecycle: Modified to add challenge route once during initialization and keep it active throughout certificate provisioning
  2. Concurrency Control: Added flags to prevent concurrent provisioning and duplicate challenge route operations
  3. Error Handling: Enhanced error messages for port conflicts and proper cleanup on errors
  4. Tests: Created comprehensive test suite for challenge route lifecycle scenarios
  5. Documentation: Updated certificate management guide with troubleshooting section for port conflicts

The fix ensures that port 80 is only bound once, preventing EADDRINUSE errors during concurrent certificate provisioning operations.

Timeline

  • Phase 1: 2 hours (Challenge route lifecycle)
  • Phase 2: 1 hour (Provisioning flow)
  • Phase 3: 2 hours (Concurrency controls)
  • Phase 4: 1 hour (Error handling)
  • Phase 5: 2 hours (Testing)
  • Phase 6: 1 hour (Documentation)

Total estimated time: 9 hours

Notes

  • This is a critical bug affecting ACME certificate provisioning
  • The fix requires careful handling of concurrent operations
  • Backward compatibility must be maintained
  • Consider impact on renewal operations and edge cases