fix(strcuture): refactor responsibilities

This commit is contained in:
2025-05-19 17:28:05 +00:00
parent 8fb67922a5
commit 465148d553
62 changed files with 1414 additions and 2066 deletions

View File

@ -1,277 +1,172 @@
# SmartProxy Development Plan
# SmartProxy Architecture Refactoring Plan
cat /home/philkunz/.claude/CLAUDE.md
## Overview
## Critical Bug Fix: Port 80 EADDRINUSE with ACME Challenge Routes
Refactor the proxy architecture to provide clearer separation of concerns between HTTP/HTTPS traffic handling and low-level connection routing.
### Problem Statement
SmartProxy encounters an "EADDRINUSE" error on port 80 when provisioning multiple ACME certificates. The issue occurs because the certificate manager adds and removes the challenge route for each certificate individually, causing race conditions when multiple certificates are provisioned concurrently.
## Current Architecture Problems
### Root Cause
The `SmartCertManager` class adds the ACME challenge route (port 80) before provisioning each certificate and removes it afterward. When multiple certificates are provisioned:
1. Each provisioning cycle adds its own challenge route
2. This triggers `updateRoutes()` which calls `PortManager.updatePorts()`
3. Port 80 is repeatedly added/removed, causing binding conflicts
1. NetworkProxy name doesn't clearly indicate it handles HTTP/HTTPS
2. HTTP parsing logic is duplicated in RouteConnectionHandler
3. Redirect and static route handling is embedded in SmartProxy
4. Unclear separation between TCP routing and HTTP processing
### Implementation Plan
## Proposed Architecture
#### Phase 1: Refactor Challenge Route Lifecycle
1. **Modify challenge route handling** in `SmartCertManager`
- [x] Add challenge route once during initialization if ACME is configured
- [x] Keep challenge route active throughout entire certificate provisioning
- [x] Remove challenge route only after all certificates are provisioned
- [x] Add concurrency control to prevent multiple simultaneous route updates
### HttpProxy (renamed from NetworkProxy)
**Purpose**: Handle all HTTP/HTTPS traffic with TLS termination
#### Phase 2: Update Certificate Provisioning Flow
2. **Refactor certificate provisioning methods**
- [x] Separate challenge route management from individual certificate provisioning
- [x] Update `provisionAcmeCertificate()` to not add/remove challenge routes
- [x] Modify `provisionAllCertificates()` to handle challenge route lifecycle
- [x] Add error handling for challenge route initialization failures
**Responsibilities**:
- TLS termination for HTTPS
- HTTP/1.1 and HTTP/2 protocol handling
- HTTP request/response parsing
- HTTP to HTTPS redirects
- ACME challenge handling
- Static route handlers
- WebSocket protocol upgrades
- Connection pooling for backend servers
- Certificate management (ACME and static)
#### Phase 3: Implement Concurrency Controls
3. **Add synchronization mechanisms**
- [x] Implement mutex/lock for challenge route operations
- [x] Ensure certificate provisioning is properly serialized
- [x] Add safeguards against duplicate challenge routes
- [x] Handle edge cases (shutdown during provisioning, renewal conflicts)
### SmartProxy
**Purpose**: Low-level connection router and port manager
#### Phase 4: Enhance Error Handling
4. **Improve error handling and recovery**
- [x] Add specific error types for port conflicts
- [x] Implement retry logic for transient port binding issues
- [x] Add detailed logging for challenge route lifecycle
- [x] Ensure proper cleanup on errors
**Responsibilities**:
- Port management (listen on multiple ports)
- Route-based connection routing
- TLS passthrough (SNI-based routing)
- NFTables integration
- Delegate HTTP/HTTPS connections to HttpProxy
- Raw TCP proxying
- Connection lifecycle management
#### Phase 5: Create Comprehensive Tests
5. **Write tests for challenge route management**
- [x] Test concurrent certificate provisioning
- [x] Test challenge route persistence during provisioning
- [x] Test error scenarios (port already in use)
- [x] Test cleanup after provisioning
- [x] Test renewal scenarios with existing challenge routes
## Implementation Plan
#### Phase 6: Update Documentation
6. **Document the new behavior**
- [x] Update certificate management documentation
- [x] Add troubleshooting guide for port conflicts
- [x] Document the challenge route lifecycle
- [x] Include examples of proper ACME configuration
### Phase 1: Rename and Reorganize NetworkProxy ✅
### Technical Details
1. **Rename NetworkProxy to HttpProxy**
- Renamed directory from `network-proxy` to `http-proxy`
- Updated all imports and references
#### Specific Code Changes
2. **Update class and file names**
- Renamed `network-proxy.ts` to `http-proxy.ts`
- Updated `NetworkProxy` class to `HttpProxy` class
- Updated all type definitions and interfaces
1. In `SmartCertManager.initialize()`:
3. **Update exports**
- Updated exports in `ts/index.ts`
- Fixed imports across the codebase
### Phase 2: Extract HTTP Logic from SmartProxy ✅
1. **Create HTTP handler modules in HttpProxy**
- Created handlers directory with:
- `redirect-handler.ts` - HTTP redirect logic
- `static-handler.ts` - Static/ACME route handling
- `index.ts` - Module exports
2. **Move HTTP parsing from RouteConnectionHandler**
- Updated `handleRedirectAction` to delegate to `RedirectHandler`
- Updated `handleStaticAction` to delegate to `StaticHandler`
- Removed duplicated HTTP parsing logic
3. **Clean up references and naming**
- Updated all NetworkProxy references to HttpProxy
- Renamed config properties: `useNetworkProxy``useHttpProxy`
- Renamed config properties: `networkProxyPort``httpProxyPort`
- Fixed HttpProxyBridge methods and references
### Phase 3: Simplify SmartProxy
1. **Update RouteConnectionHandler**
- Remove embedded HTTP parsing
- Delegate HTTP routes to HttpProxy
- Focus on connection routing only
2. **Simplified route handling**
```typescript
// Add challenge route once at initialization
if (hasAcmeRoutes && this.acmeOptions?.email) {
await this.addChallengeRoute();
// Simplified handleRedirectAction
private handleRedirectAction(socket, record, route) {
// Delegate to HttpProxy
this.httpProxy.handleRedirect(socket, route);
}
```
2. Modify `provisionAcmeCertificate()`:
```typescript
// Remove these lines:
// await this.addChallengeRoute();
// await this.removeChallengeRoute();
```
3. Update `stop()` method:
```typescript
// Always remove challenge route on shutdown
if (this.challengeRoute) {
await this.removeChallengeRoute();
}
```
4. Add concurrency control:
```typescript
private challengeRouteLock = new AsyncLock();
private async manageChallengeRoute(operation: 'add' | 'remove'): Promise<void> {
await this.challengeRouteLock.acquire('challenge-route', async () => {
if (operation === 'add') {
await this.addChallengeRoute();
} else {
await this.removeChallengeRoute();
}
});
// Simplified handleStaticAction
private handleStaticAction(socket, record, route) {
// Delegate to HttpProxy
this.httpProxy.handleStatic(socket, route);
}
```
### Success Criteria
- [x] No EADDRINUSE errors when provisioning multiple certificates
- [x] Challenge route remains active during entire provisioning cycle
- [x] Port 80 is only bound once per SmartProxy instance
- [x] Proper cleanup on shutdown or error
- [x] All tests pass
- [x] Documentation clearly explains the behavior
3. **Update NetworkProxyBridge**
- Rename to HttpProxyBridge
- Update integration points
### Implementation Summary
### Phase 4: Consolidate HTTP Utilities ✅
The port 80 EADDRINUSE issue has been successfully fixed through the following changes:
1. **Move HTTP types to http-proxy**
- Created consolidated `http-types.ts` in `ts/proxies/http-proxy/models/`
- Includes HTTP status codes, error classes, and interfaces
- Added helper functions like `getStatusText()`
1. **Challenge Route Lifecycle**: Modified to add challenge route once during initialization and keep it active throughout certificate provisioning
2. **Concurrency Control**: Added flags to prevent concurrent provisioning and duplicate challenge route operations
3. **Error Handling**: Enhanced error messages for port conflicts and proper cleanup on errors
4. **Tests**: Created comprehensive test suite for challenge route lifecycle scenarios
5. **Documentation**: Updated certificate management guide with troubleshooting section for port conflicts
2. **Clean up ts/http directory**
- Kept only router functionality
- Replaced local HTTP types with re-exports from HttpProxy
- Updated imports throughout the codebase to use consolidated types
The fix ensures that port 80 is only bound once, preventing EADDRINUSE errors during concurrent certificate provisioning operations.
### Phase 5: Update Tests and Documentation ✅
### Timeline
- Phase 1: 2 hours (Challenge route lifecycle)
- Phase 2: 1 hour (Provisioning flow)
- Phase 3: 2 hours (Concurrency controls)
- Phase 4: 1 hour (Error handling)
- Phase 5: 2 hours (Testing)
- Phase 6: 1 hour (Documentation)
1. **Update test files**
- Renamed NetworkProxy references to HttpProxy
- Renamed test files to match new naming
- Updated imports and references throughout tests
- Fixed certificate manager method names
Total estimated time: 9 hours
2. **Update documentation**
- Updated README to reflect HttpProxy naming
- Updated architecture descriptions
- Updated usage examples
- Fixed all API documentation references
### Notes
- This is a critical bug affecting ACME certificate provisioning
- The fix requires careful handling of concurrent operations
- Backward compatibility must be maintained
- Consider impact on renewal operations and edge cases
## Migration Steps
## NEW FINDINGS: Additional Port Management Issues
1. Create feature branch: `refactor/http-proxy-consolidation`
2. Phase 1: Rename NetworkProxy (1 day)
3. Phase 2: Extract HTTP logic (2 days)
4. Phase 3: Simplify SmartProxy (1 day)
5. Phase 4: Consolidate utilities (1 day)
6. Phase 5: Update tests/docs (1 day)
7. Integration testing (1 day)
8. Code review and merge
### Problem Statement
Further investigation has revealed additional issues beyond the initial port 80 EADDRINUSE error:
## Benefits
1. **Race Condition in updateRoutes**: Certificate manager is recreated during route updates, potentially causing duplicate challenge routes
2. **Lost State**: The `challengeRouteActive` flag is not persisted when certificate manager is recreated
3. **No Global Synchronization**: Multiple concurrent route updates can create conflicting certificate managers
4. **Incomplete Cleanup**: Challenge route removal doesn't verify actual port release
1. **Clear Separation**: HTTP/HTTPS handling is clearly separated from TCP routing
2. **Better Naming**: HttpProxy clearly indicates its purpose
3. **No Duplication**: HTTP parsing logic exists in one place
4. **Maintainability**: Easier to modify HTTP handling without affecting routing
5. **Testability**: Each component has a single responsibility
6. **Performance**: Optimized paths for different traffic types
### Implementation Plan for Additional Fixes
## Future Enhancements
#### Phase 1: Fix updateRoutes Race Condition
1. **Preserve certificate manager state during route updates**
- [x] Track active challenge routes at SmartProxy level
- [x] Pass existing state to new certificate manager instances
- [x] Ensure challenge route is only added once across recreations
- [x] Add proper cleanup before recreation
After this refactoring, we can more easily add:
#### Phase 2: Implement Global Route Update Lock
2. **Add synchronization for route updates**
- [x] Implement mutex/semaphore for `updateRoutes` method
- [x] Prevent concurrent certificate manager recreations
- [x] Ensure atomic route updates
- [x] Add timeout handling for locks
1. HTTP/3 (QUIC) support in HttpProxy
2. Advanced HTTP features (compression, caching)
3. HTTP middleware system
4. Protocol-specific optimizations
5. Better HTTP/2 multiplexing
#### Phase 3: Improve State Management
3. **Persist critical state across certificate manager instances**
- [x] Create global state store for ACME operations
- [x] Track active challenge routes globally
- [x] Maintain port allocation state
- [x] Add state recovery mechanisms
## Breaking Changes
#### Phase 4: Enhance Cleanup Verification
4. **Verify resource cleanup before recreation**
- [x] Wait for old certificate manager to fully stop
- [x] Verify challenge route removal from port manager
- [x] Add cleanup confirmation callbacks
- [x] Implement rollback on cleanup failure
1. `NetworkProxy` class renamed to `HttpProxy`
2. Import paths change from `network-proxy` to `http-proxy`
3. Some type names may change for consistency
#### Phase 5: Add Comprehensive Testing
5. **Test race conditions and edge cases**
- [x] Test rapid route updates with ACME
- [x] Test concurrent certificate manager operations
- [x] Test state persistence across recreations
- [x] Test cleanup verification logic
## Rollback Plan
### Technical Implementation
1. **Global Challenge Route Tracker**:
```typescript
class SmartProxy {
private globalChallengeRouteActive = false;
private routeUpdateLock = new Mutex();
async updateRoutes(newRoutes: IRouteConfig[]): Promise<void> {
await this.routeUpdateLock.runExclusive(async () => {
// Update logic here
});
}
}
```
2. **State Preservation**:
```typescript
if (this.certManager) {
const state = {
challengeRouteActive: this.globalChallengeRouteActive,
acmeOptions: this.certManager.getAcmeOptions(),
// ... other state
};
await this.certManager.stop();
await this.verifyChallengeRouteRemoved();
this.certManager = await this.createCertificateManager(
newRoutes,
'./certs',
state
);
}
```
3. **Cleanup Verification**:
```typescript
private async verifyChallengeRouteRemoved(): Promise<void> {
const maxRetries = 10;
for (let i = 0; i < maxRetries; i++) {
if (!this.portManager.isListening(80)) {
return;
}
await this.sleep(100);
}
throw new Error('Failed to verify challenge route removal');
}
```
### Success Criteria
- [ ] No race conditions during route updates
- [ ] State properly preserved across certificate manager recreations
- [ ] No duplicate challenge routes
- [ ] Clean resource management
- [ ] All edge cases handled gracefully
### Timeline for Additional Fixes
- Phase 1: 3 hours (Race condition fix)
- Phase 2: 2 hours (Global synchronization)
- Phase 3: 2 hours (State management)
- Phase 4: 2 hours (Cleanup verification)
- Phase 5: 3 hours (Testing)
Total estimated time: 12 hours
### Priority
These additional fixes are HIGH PRIORITY as they address fundamental issues that could cause:
- Port binding errors
- Certificate provisioning failures
- Resource leaks
- Inconsistent proxy state
The fixes should be implemented immediately after the initial port 80 EADDRINUSE fix is deployed.
### Implementation Complete
All additional port management issues have been successfully addressed:
1. **Mutex Implementation**: Created a custom `Mutex` class for synchronizing route updates
2. **Global State Tracking**: Implemented `AcmeStateManager` to track challenge routes globally
3. **State Preservation**: Modified `SmartCertManager` to accept and preserve state across recreations
4. **Cleanup Verification**: Added `verifyChallengeRouteRemoved` method to ensure proper cleanup
5. **Comprehensive Testing**: Created test suites for race conditions and state management
The implementation ensures:
- No concurrent route updates can create conflicting states
- Challenge route state is preserved across certificate manager recreations
- Port 80 is properly managed without EADDRINUSE errors
- All resources are cleaned up properly during shutdown
All tests are ready to run and the implementation is complete.
If issues arise:
1. Git revert to previous commit
2. Re-deploy previous version
3. Document lessons learned
4. Plan incremental changes