fix(performance): start with planning performance optimizations

This commit is contained in:
2025-05-31 17:14:15 +00:00
parent af753ba1a8
commit 02603c3b07
6 changed files with 1397 additions and 1139 deletions

View File

@@ -1,86 +1,170 @@
# SmartProxy Module Problems
# SmartProxy Performance Issues Report
Based on test analysis, the following potential issues have been identified in the SmartProxy module:
## Executive Summary
This report identifies performance issues and blocking operations in the SmartProxy codebase that could impact scalability and responsiveness under high load.
## 1. HttpProxy Route Configuration Issue
**Location**: `ts/proxies/http-proxy/http-proxy.ts:380`
**Problem**: The HttpProxy is trying to read the 'type' property of an undefined object when updating route configurations.
**Evidence**: `test.http-forwarding-fix.ts` fails with:
```
TypeError: Cannot read properties of undefined (reading 'type')
at HttpProxy.updateRouteConfigs (/mnt/data/lossless/push.rocks/smartproxy/ts/proxies/http-proxy/http-proxy.ts:380:24)
```
**Impact**: Routes with `useHttpProxy` configuration may not work properly.
## Critical Issues
## 2. Connection Forwarding Issues
**Problem**: Basic TCP forwarding appears to not be working correctly after the simplification to just 'forward' and 'socket-handler' action types.
**Evidence**: Multiple forwarding tests timeout waiting for data to be forwarded:
- `test.forwarding-fix-verification.ts` - times out waiting for forwarded data
- `test.connection-forwarding.ts` - times out on SNI-based forwarding
**Impact**: The 'forward' action type may not be properly forwarding connections to target servers.
### 1. **Synchronous Filesystem Operations**
These operations block the event loop and should be replaced with async alternatives:
## 3. Missing Certificate Manager Methods
**Problem**: Tests expect `provisionAllCertificates` method on certificate manager but it may not exist or may not be properly initialized.
**Evidence**: Multiple tests fail with "this.certManager.provisionAllCertificates is not a function"
**Impact**: Certificate provisioning may not work as expected.
#### Certificate Management
- `ts/proxies/http-proxy/certificate-manager.ts:29`: `fs.existsSync()`
- `ts/proxies/http-proxy/certificate-manager.ts:30`: `fs.mkdirSync()`
- `ts/proxies/http-proxy/certificate-manager.ts:49-50`: `fs.readFileSync()` for loading certificates
## 4. Route Update Mechanism
**Problem**: The route update mechanism may have issues preserving certificate manager callbacks and other state.
**Evidence**: Tests specifically designed to verify callback preservation after route updates.
**Impact**: Dynamic route updates might break certificate management functionality.
#### NFTables Proxy
- `ts/proxies/nftables-proxy/nftables-proxy.ts`: Multiple uses of `execSync()` for system commands
- `ts/proxies/nftables-proxy/nftables-proxy.ts`: Multiple `fs.writeFileSync()` and `fs.unlinkSync()` operations
## 5. Route-Specific Security Not Fully Implemented
**Problem**: While the route definitions support security configurations (ipAllowList, ipBlockList, authentication), these are not being enforced at the route level.
**Evidence**:
- SecurityManager has methods like `isIPAuthorized` for route-specific security
- Route connection handler only checks global IP validation, not route-specific security rules
- No evidence of route.action.security being checked when handling connections
**Impact**: Route-specific security rules defined in configuration are not enforced, potentially allowing unauthorized access.
**Status**: ✅ FIXED - Route-specific IP allow/block lists are now enforced when a route is matched. Authentication is logged as not enforceable for non-terminated connections.
**Additional Fix**: Removed security checks from route matching logic - security is now properly enforced AFTER a route is matched, not during matching.
#### Certificate Store
- `ts/proxies/smart-proxy/cert-store.ts:8`: `ensureDirSync()`
- `ts/proxies/smart-proxy/cert-store.ts:15,31,76`: `fileExistsSync()`
- `ts/proxies/smart-proxy/cert-store.ts:77`: `removeManySync()`
## 6. Security Property Location Consolidation
**Problem**: Security was defined in two places - route.security and route.action.security - causing confusion.
**Status**: ✅ FIXED - Consolidated to only route.security. Removed action.security from types and updated all references.
### 2. **Event Loop Blocking Operations**
#### Busy Wait Loop
- `ts/proxies/nftables-proxy/nftables-proxy.ts:235-238`:
```typescript
const waitUntil = Date.now() + retryDelayMs;
while (Date.now() < waitUntil) {
// busy wait - blocks event loop completely
}
```
This is extremely problematic as it blocks the entire Node.js event loop.
### 3. **Potential Memory Leaks**
#### Timer Management Issues
Several timers are created without proper cleanup:
- `ts/proxies/http-proxy/function-cache.ts`: `setInterval()` without storing reference for cleanup
- `ts/proxies/http-proxy/request-handler.ts`: `setInterval()` for rate limit cleanup without cleanup
- `ts/core/utils/shared-security-manager.ts`: `cleanupInterval` stored but no cleanup method
#### Event Listener Accumulation
- Multiple instances of event listeners being added without corresponding cleanup
- Connection handlers add listeners without always removing them on connection close
### 4. **Connection Pool Management**
#### ConnectionPool (ts/proxies/http-proxy/connection-pool.ts)
**Good practices observed:**
- Proper connection lifecycle management
- Periodic cleanup of idle connections
- Connection limits enforcement
**Potential issues:**
- No backpressure mechanism when pool is full
- Synchronous sorting operation in `cleanupConnectionPool()` could be slow with many connections
### 5. **Resource Management Issues**
#### Socket Cleanup
- Some error paths don't properly clean up sockets
- Missing `removeAllListeners()` in some error scenarios could lead to memory leaks
#### Timeout Management
- Inconsistent timeout handling across different components
- Some sockets created without timeout settings
### 6. **JSON Operations on Large Objects**
- `ts/proxies/smart-proxy/cert-store.ts:21`: `JSON.parse()` on certificate metadata
- `ts/proxies/smart-proxy/cert-store.ts:71`: `JSON.stringify()` with pretty printing
- `ts/proxies/http-proxy/function-cache.ts:76`: `JSON.stringify()` for cache keys (called frequently)
## Recommendations
1. **Verify Forward Action Implementation**: Check that the 'forward' action type properly establishes bidirectional data flow between client and target server. ✅ FIXED - Basic forwarding now works correctly.
### Immediate Actions (High Priority)
2. **Fix HttpProxy Route Handling**: Ensure that route objects passed to HttpProxy.updateRouteConfigs have the expected structure with all required properties. ✅ FIXED - Routes now preserve their structure.
1. **Replace Synchronous Operations**
```typescript
// Instead of:
if (fs.existsSync(path)) { ... }
// Use:
try {
await fs.promises.access(path);
// file exists
} catch {
// file doesn't exist
}
```
3. **Review Certificate Manager API**: Ensure all expected methods exist and are properly documented.
2. **Fix Busy Wait Loop**
```typescript
// Instead of:
while (Date.now() < waitUntil) { }
// Use:
await new Promise(resolve => setTimeout(resolve, retryDelayMs));
```
4. **Add Integration Tests**: Many unit tests are testing internal implementation details. Consider adding more integration tests that test the public API.
3. **Add Timer Cleanup**
```typescript
class Component {
private cleanupTimer?: NodeJS.Timeout;
start() {
this.cleanupTimer = setInterval(() => { ... }, 60000);
}
stop() {
if (this.cleanupTimer) {
clearInterval(this.cleanupTimer);
this.cleanupTimer = undefined;
}
}
}
```
5. **Implement Route-Specific Security**: Add security checks when a route is matched to enforce route-specific IP allow/block lists and authentication rules. ✅ FIXED - IP allow/block lists are now enforced at the route level.
### Medium Priority
6. **Fix TLS Detection Logic**: The connection handler was treating all connections as TLS. This has been partially fixed but needs proper testing for all TLS modes.
1. **Optimize JSON Operations**
- Cache JSON.stringify results for frequently used objects
- Consider using faster hashing for cache keys (e.g., crypto.createHash)
- Use streaming JSON parsers for large objects
## 7. HTTP Domain Matching Issue
**Problem**: Routes with domain restrictions fail to match HTTP connections because domain information is only available after HTTP headers are received, but route matching happens immediately upon connection.
**Evidence**:
- `test.http-port8080-forwarding.ts` - "No route found for connection on port 8080" despite having a matching route
- HTTP connections provide domain info via the Host header, which arrives after the initial TCP connection
- Route matching in `handleInitialData` happens before HTTP headers are parsed
**Impact**: HTTP routes with domain restrictions cannot be matched, forcing users to remove domain restrictions for HTTP routes.
**Root Cause**: For non-TLS connections, SmartProxy attempts to match routes immediately, but the domain information needed for matching is only available after parsing HTTP headers.
**Status**: ✅ FIXED - Added skipDomainCheck parameter to route matching for HTTP proxy ports. When a port is configured with useHttpProxy and the connection is not TLS, domain validation is skipped at the initial route matching stage, allowing the HttpProxy to handle domain-based routing after headers are received.
2. **Improve Connection Pool**
- Implement backpressure/queueing when pool is full
- Use a heap or priority queue for connection management instead of sorting
## 8. HttpProxy Plain HTTP Forwarding Issue
**Problem**: HttpProxy is an HTTPS server but SmartProxy forwards plain HTTP connections to it via `useHttpProxy` configuration.
**Evidence**:
- `test.http-port8080-forwarding.ts` - Connection immediately closed after forwarding to HttpProxy
- HttpProxy is created with `http2.createSecureServer` expecting TLS connections
- SmartProxy forwards raw HTTP data to HttpProxy's HTTPS port
**Impact**: Plain HTTP connections cannot be handled by HttpProxy, despite `useHttpProxy` configuration suggesting this should work.
**Root Cause**: Design mismatch - HttpProxy is designed for HTTPS/TLS termination, not plain HTTP forwarding.
**Status**: Documented. The `useHttpProxy` configuration should only be used for ports that receive TLS connections requiring termination. For plain HTTP forwarding, use direct forwarding without HttpProxy.
3. **Standardize Resource Cleanup**
- Create a base class for components with lifecycle management
- Ensure all event listeners are removed on cleanup
- Add abort controllers for better cancellation support
## 9. Route Security Configuration Location Issue
**Problem**: Tests were placing security configuration in `route.action.security` instead of `route.security`.
**Evidence**:
- `test.route-security.ts` - IP block list test failing because security was in wrong location
- IRouteConfig interface defines security at route level, not inside action
**Impact**: Security rules defined in action.security were ignored, causing tests to fail.
**Status**: ✅ FIXED - Updated tests to place security configuration at the correct location (route.security).
### Long-term Improvements
1. **Worker Threads**
- Move CPU-intensive operations to worker threads
- Consider using worker pools for NFTables operations
2. **Monitoring and Metrics**
- Add performance monitoring for event loop lag
- Track connection pool utilization
- Monitor memory usage patterns
3. **Graceful Degradation**
- Implement circuit breakers for backend connections
- Add request queuing with overflow protection
- Implement adaptive timeout strategies
## Impact Assessment
These issues primarily affect:
- **Scalability**: Blocking operations limit concurrent connection handling
- **Responsiveness**: Event loop blocking causes latency spikes
- **Stability**: Memory leaks could cause crashes under sustained load
- **Resource Usage**: Inefficient resource management increases memory/CPU usage
## Testing Recommendations
1. Load test with high connection counts (10k+ concurrent)
2. Monitor event loop lag under stress
3. Test long-running scenarios to detect memory leaks
4. Benchmark with async vs sync operations to measure improvement
## Conclusion
While SmartProxy has good architectural design and many best practices, the identified blocking operations and resource management issues could significantly impact performance under high load. The most critical issues (busy wait loop and synchronous filesystem operations) should be addressed immediately.