fix(acme): Refactor ACME challenge route lifecycle to prevent port 80 EADDRINUSE errors

This commit is contained in:
2025-05-19 01:59:52 +00:00
parent a54cbf7417
commit 094edfafd1
6 changed files with 559 additions and 92 deletions

View File

@ -2,100 +2,136 @@
cat /home/philkunz/.claude/CLAUDE.md
## Critical Bug Fix: Missing Route Update Callback in updateRoutes Method
## Critical Bug Fix: Port 80 EADDRINUSE with ACME Challenge Routes
### Problem Statement
SmartProxy v19.2.2 has a bug where the ACME certificate manager loses its route update callback when routes are dynamically updated. This causes the error "No route update callback set" and prevents automatic certificate acquisition.
SmartProxy encounters an "EADDRINUSE" error on port 80 when provisioning multiple ACME certificates. The issue occurs because the certificate manager adds and removes the challenge route for each certificate individually, causing race conditions when multiple certificates are provisioned concurrently.
### Root Cause
When `updateRoutes()` creates a new SmartCertManager instance, it fails to set the route update callback that's required for ACME challenges. This callback is properly set in `initializeCertificateManager()` but is missing from the route update flow.
The `SmartCertManager` class adds the ACME challenge route (port 80) before provisioning each certificate and removes it afterward. When multiple certificates are provisioned:
1. Each provisioning cycle adds its own challenge route
2. This triggers `updateRoutes()` which calls `PortManager.updatePorts()`
3. Port 80 is repeatedly added/removed, causing binding conflicts
### Implementation Plan
#### Phase 1: Fix the Bug
1. **Update the updateRoutes method** in `/mnt/data/lossless/push.rocks/smartproxy/ts/proxies/smart-proxy/smart-proxy.ts`
- [ ] Add the missing callback setup before initializing the new certificate manager
- [ ] Ensure the callback is set after creating the new SmartCertManager instance
#### Phase 1: Refactor Challenge Route Lifecycle
1. **Modify challenge route handling** in `SmartCertManager`
- [ ] Add challenge route once during initialization if ACME is configured
- [ ] Keep challenge route active throughout entire certificate provisioning
- [ ] Remove challenge route only after all certificates are provisioned
- [ ] Add concurrency control to prevent multiple simultaneous route updates
#### Phase 2: Create Tests
2. **Write comprehensive tests** for the route update functionality
- [ ] Create test file: `test/test.route-update-callback.node.ts`
- [ ] Test that callback is preserved when routes are updated
- [ ] Test that ACME challenges work after route updates
- [ ] Test edge cases (multiple updates, no cert manager, etc.)
#### Phase 2: Update Certificate Provisioning Flow
2. **Refactor certificate provisioning methods**
- [ ] Separate challenge route management from individual certificate provisioning
- [ ] Update `provisionAcmeCertificate()` to not add/remove challenge routes
- [ ] Modify `provisionAllCertificates()` to handle challenge route lifecycle
- [ ] Add error handling for challenge route initialization failures
#### Phase 3: Enhance Documentation
3. **Update documentation** to clarify the route update behavior
- [ ] Add section to certificate-management.md about dynamic route updates
- [ ] Document the callback requirement for ACME challenges
- [ ] Include example of proper route update implementation
#### Phase 3: Implement Concurrency Controls
3. **Add synchronization mechanisms**
- [ ] Implement mutex/lock for challenge route operations
- [ ] Ensure certificate provisioning is properly serialized
- [ ] Add safeguards against duplicate challenge routes
- [ ] Handle edge cases (shutdown during provisioning, renewal conflicts)
#### Phase 4: Prevent Future Regressions
4. **Refactor for better maintainability**
- [ ] Consider extracting certificate manager setup to a shared method
- [ ] Add code comments explaining the callback requirement
- [ ] Consider making the callback setup more explicit in the API
#### Phase 4: Enhance Error Handling
4. **Improve error handling and recovery**
- [ ] Add specific error types for port conflicts
- [ ] Implement retry logic for transient port binding issues
- [ ] Add detailed logging for challenge route lifecycle
- [ ] Ensure proper cleanup on errors
#### Phase 5: Create Comprehensive Tests
5. **Write tests for challenge route management**
- [ ] Test concurrent certificate provisioning
- [ ] Test challenge route persistence during provisioning
- [ ] Test error scenarios (port already in use)
- [ ] Test cleanup after provisioning
- [ ] Test renewal scenarios with existing challenge routes
#### Phase 6: Update Documentation
6. **Document the new behavior**
- [ ] Update certificate management documentation
- [ ] Add troubleshooting guide for port conflicts
- [ ] Document the challenge route lifecycle
- [ ] Include examples of proper ACME configuration
### Technical Details
#### Specific Code Changes
1. In `updateRoutes()` method (around line 535), add:
1. In `SmartCertManager.initialize()`:
```typescript
// Set route update callback for ACME challenges
this.certManager.setUpdateRoutesCallback(async (routes) => {
await this.updateRoutes(routes);
});
// Add challenge route once at initialization
if (hasAcmeRoutes && this.acmeOptions?.email) {
await this.addChallengeRoute();
}
```
2. Consider refactoring the certificate manager setup into a helper method to avoid duplication:
2. Modify `provisionAcmeCertificate()`:
```typescript
private async setupCertificateManager(
routes: IRouteConfig[],
certStore: string,
acmeOptions?: any
): Promise<SmartCertManager> {
const certManager = new SmartCertManager(routes, certStore, acmeOptions);
// Always set up the route update callback
certManager.setUpdateRoutesCallback(async (routes) => {
await this.updateRoutes(routes);
// Remove these lines:
// await this.addChallengeRoute();
// await this.removeChallengeRoute();
```
3. Update `stop()` method:
```typescript
// Always remove challenge route on shutdown
if (this.challengeRoute) {
await this.removeChallengeRoute();
}
```
4. Add concurrency control:
```typescript
private challengeRouteLock = new AsyncLock();
private async manageChallengeRoute(operation: 'add' | 'remove'): Promise<void> {
await this.challengeRouteLock.acquire('challenge-route', async () => {
if (operation === 'add') {
await this.addChallengeRoute();
} else {
await this.removeChallengeRoute();
}
});
if (this.networkProxyBridge.getNetworkProxy()) {
certManager.setNetworkProxy(this.networkProxyBridge.getNetworkProxy());
}
return certManager;
}
```
### Success Criteria
- [x] ACME certificate acquisition works after route updates
- [x] No "No route update callback set" errors occur
- [x] No EADDRINUSE errors when provisioning multiple certificates
- [x] Challenge route remains active during entire provisioning cycle
- [x] Port 80 is only bound once per SmartProxy instance
- [x] Proper cleanup on shutdown or error
- [x] All tests pass
- [x] Documentation clearly explains the behavior
- [x] Code is more maintainable and less prone to regression
### Implementation Summary
The bug has been successfully fixed through the following steps:
The port 80 EADDRINUSE issue has been successfully fixed through the following changes:
1. **Bug Fix Applied**: Added the missing `setUpdateRoutesCallback` call in the `updateRoutes` method
2. **Tests Created**: Comprehensive test suite validates the fix and prevents regression
3. **Documentation Updated**: Added section on dynamic route updates to the certificate management guide
4. **Code Refactored**: Extracted certificate manager creation into a helper method for better maintainability
1. **Challenge Route Lifecycle**: Modified to add challenge route once during initialization and keep it active throughout certificate provisioning
2. **Concurrency Control**: Added flags to prevent concurrent provisioning and duplicate challenge route operations
3. **Error Handling**: Enhanced error messages for port conflicts and proper cleanup on errors
4. **Tests**: Created comprehensive test suite for challenge route lifecycle scenarios
5. **Documentation**: Updated certificate management guide with troubleshooting section for port conflicts
The fix ensures that when routes are dynamically updated, the certificate manager maintains its ability to update routes for ACME challenges, preventing the "No route update callback set" error.
The fix ensures that port 80 is only bound once, preventing EADDRINUSE errors during concurrent certificate provisioning operations.
### Timeline
- Phase 1: Immediate fix (30 minutes)
- Phase 2: Test creation (1 hour)
- Phase 3: Documentation (30 minutes)
- Phase 4: Refactoring (1 hour)
- Phase 1: 2 hours (Challenge route lifecycle)
- Phase 2: 1 hour (Provisioning flow)
- Phase 3: 2 hours (Concurrency controls)
- Phase 4: 1 hour (Error handling)
- Phase 5: 2 hours (Testing)
- Phase 6: 1 hour (Documentation)
Total estimated time: 3 hours
Total estimated time: 9 hours
### Notes
- This is a critical bug that affects production use of SmartProxy
- The fix is straightforward but requires careful testing
- Consider backporting to v19.2.x branch if maintaining multiple versions
- This is a critical bug affecting ACME certificate provisioning
- The fix requires careful handling of concurrent operations
- Backward compatibility must be maintained
- Consider impact on renewal operations and edge cases