smartproxy/readme.hints.md
Philipp Kunz d45485985a Fix socket error handling to prevent server crashes on ECONNREFUSED
This commit addresses critical issues where unhandled socket connection errors (ECONNREFUSED) would crash the server and cause memory leaks with rising connection counts.

Changes:
- Add createSocketWithErrorHandler() utility that attaches error handlers immediately upon socket creation
- Update https-passthrough-handler to use safe socket creation and clean up client sockets on server connection failure
- Update https-terminate-to-http-handler to use safe socket creation
- Ensure proper connection cleanup when server connections fail
- Document the fix in readme.hints.md and create implementation plan in readme.plan.md

The fix prevents race conditions where sockets could emit errors before handlers were attached, and ensures failed connections are properly cleaned up to prevent memory leaks.
2025-06-01 13:30:06 +00:00

460 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# SmartProxy Project Hints
## Project Overview
- Package: `@push.rocks/smartproxy` high-performance proxy supporting HTTP(S), TCP, WebSocket, and ACME integration.
- Written in TypeScript, compiled output in `dist_ts/`, uses ESM with NodeNext resolution.
## Important: ACME Configuration in v19.0.0
- **Breaking Change**: ACME configuration must be placed within individual route TLS settings, not at the top level
- Route-level ACME config is the ONLY way to enable SmartAcme initialization
- SmartCertManager requires email in route config for certificate acquisition
- Top-level ACME configuration is ignored in v19.0.0
## Repository Structure
- `ts/` TypeScript source files:
- `index.ts` exports main modules.
- `plugins.ts` centralizes native and third-party imports.
- Subdirectories: `networkproxy/`, `nftablesproxy/`, `port80handler/`, `redirect/`, `smartproxy/`.
- Key classes: `ProxyRouter` (`classes.router.ts`), `SmartProxy` (`classes.smartproxy.ts`), plus handlers/managers.
- `dist_ts/` transpiled `.js` and `.d.ts` files mirroring `ts/` structure.
- `test/` test suites in TypeScript:
- `test.router.ts` routing logic (hostname matching, wildcards, path parameters, config management).
- `test.smartproxy.ts` proxy behavior tests (TCP forwarding, SNI handling, concurrency, chaining, timeouts).
- `test/helpers/` utilities (e.g., certificates).
- `assets/certs/` placeholder certificates for ACME and TLS.
## Development Setup
- Requires `pnpm` (v10+).
- Install dependencies: `pnpm install`.
- Build: `pnpm build` (runs `tsbuild --web --allowimplicitany`).
- Test: `pnpm test` (runs `tstest test/`).
- Format: `pnpm format` (runs `gitzone format`).
## How to Test
### Test Structure
Tests use tapbundle from `@git.zone/tstest`. The correct pattern is:
```typescript
import { tap, expect } from '@git.zone/tstest/tapbundle';
tap.test('test description', async () => {
// Test logic here
expect(someValue).toEqual(expectedValue);
});
// IMPORTANT: Must end with tap.start()
tap.start();
```
### Expect Syntax (from @push.rocks/smartexpect)
```typescript
// Type assertions
expect('hello').toBeTypeofString();
expect(42).toBeTypeofNumber();
// Equality
expect('hithere').toEqual('hithere');
// Negated assertions
expect(1).not.toBeTypeofString();
// Regular expressions
expect('hithere').toMatch(/hi/);
// Numeric comparisons
expect(5).toBeGreaterThan(3);
expect(0.1 + 0.2).toBeCloseTo(0.3, 10);
// Arrays
expect([1, 2, 3]).toContain(2);
expect([1, 2, 3]).toHaveLength(3);
// Async assertions
await expect(asyncFunction()).resolves.toEqual('expected');
await expect(asyncFunction()).resolves.withTimeout(5000).toBeTypeofString();
// Complex object navigation
expect(complexObject)
.property('users')
.arrayItem(0)
.property('name')
.toEqual('Alice');
```
### Test Modifiers
- `tap.only.test()` - Run only this test
- `tap.skip.test()` - Skip a test
- `tap.timeout()` - Set test-specific timeout
### Running Tests
- All tests: `pnpm test`
- Specific test: `tsx test/test.router.ts`
- With options: `tstest test/**/*.ts --verbose --timeout 60`
### Test File Requirements
- Must start with `test.` prefix
- Must use `.ts` extension
- Must call `tap.start()` at the end
## Coding Conventions
- Import modules via `plugins.ts`:
```ts
import * as plugins from './plugins.ts';
const server = new plugins.http.Server();
```
- Reference plugins with full path: `plugins.acme`, `plugins.smartdelay`, `plugins.minimatch`, etc.
- Path patterns support globs (`*`) and parameters (`:param`) in `ProxyRouter`.
- Wildcard hostname matching leverages `minimatch` patterns.
## Key Components
- **ProxyRouter**
- Methods: `routeReq`, `routeReqWithDetails`.
- Hostname matching: case-insensitive, strips port, supports exact, wildcard, TLD, complex patterns.
- Path routing: exact, wildcard, parameter extraction (`pathParams`), returns `pathMatch` and `pathRemainder`.
- Config API: `setNewProxyConfigs`, `addProxyConfig`, `removeProxyConfig`, `getHostnames`, `getProxyConfigs`.
- **SmartProxy**
- Manages one or more `net.Server` instances to forward TCP streams.
- Options: `preserveSourceIP`, `defaultAllowedIPs`, `globalPortRanges`, `sniEnabled`.
- DomainConfigManager: round-robin selection for multiple target IPs.
- Graceful shutdown in `stop()`, ensures no lingering servers or sockets.
## Notable Points
- **TSConfig**: `module: NodeNext`, `verbatimModuleSyntax`, allows `.js` extension imports in TS.
- Mermaid diagrams and architecture flows in `readme.md` illustrate component interactions and protocol flows.
- CLI entrypoint (`cli.js`) supports command-line usage (ACME, proxy controls).
- ACME and certificate handling via `Port80Handler` and `helpers.certificates.ts`.
## ACME/Certificate Configuration Example (v19.0.0)
```typescript
const proxy = new SmartProxy({
routes: [{
name: 'example.com',
match: { domains: 'example.com', ports: 443 },
action: {
type: 'forward',
target: { host: 'localhost', port: 8080 },
tls: {
mode: 'terminate',
certificate: 'auto',
acme: { // ACME config MUST be here, not at top level
email: 'ssl@example.com',
useProduction: false,
challengePort: 80
}
}
}
}]
});
```
## TODOs / Considerations
- Ensure import extensions in source match build outputs (`.ts` vs `.js`).
- Update `plugins.ts` when adding new dependencies.
- Maintain test coverage for new routing or proxy features.
- Keep `ts/` and `dist_ts/` in sync after refactors.
- Consider implementing top-level ACME config support for backward compatibility
## HTTP-01 ACME Challenge Fix (v19.3.8)
### Issue
Non-TLS connections on ports configured in `useHttpProxy` were not being forwarded to HttpProxy. This caused ACME HTTP-01 challenges to fail when the ACME port (usually 80) was included in `useHttpProxy`.
### Root Cause
In the `RouteConnectionHandler.handleForwardAction` method, only connections with TLS settings (mode: 'terminate' or 'terminate-and-reencrypt') were being forwarded to HttpProxy. Non-TLS connections were always handled as direct connections, even when the port was configured for HttpProxy.
### Solution
Added a check for non-TLS connections on ports listed in `useHttpProxy`:
```typescript
// No TLS settings - check if this port should use HttpProxy
const isHttpProxyPort = this.settings.useHttpProxy?.includes(record.localPort);
if (isHttpProxyPort && this.httpProxyBridge.getHttpProxy()) {
// Forward non-TLS connections to HttpProxy if configured
this.httpProxyBridge.forwardToHttpProxy(/*...*/);
return;
}
```
### Test Coverage
- `test/test.http-fix-unit.ts` - Unit tests verifying the fix
- Tests confirm that non-TLS connections on HttpProxy ports are properly forwarded
- Tests verify that non-HttpProxy ports still use direct connections
### Configuration Example
```typescript
const proxy = new SmartProxy({
useHttpProxy: [80], // Enable HttpProxy for port 80
httpProxyPort: 8443,
acme: {
email: 'ssl@example.com',
port: 80
},
routes: [
// Your routes here
]
});
```
## ACME Certificate Provisioning Timing Fix (v19.3.9)
### Issue
Certificate provisioning would start before ports were listening, causing ACME HTTP-01 challenges to fail with connection refused errors.
### Root Cause
SmartProxy initialization sequence:
1. Certificate manager initialized → immediately starts provisioning
2. Ports start listening (too late for ACME challenges)
### Solution
Deferred certificate provisioning until after ports are ready:
```typescript
// SmartCertManager.initialize() now skips automatic provisioning
// SmartProxy.start() calls provisionAllCertificates() directly after ports are listening
```
### Test Coverage
- `test/test.acme-timing-simple.ts` - Verifies proper timing sequence
### Migration
Update to v19.3.9+, no configuration changes needed.
## Socket Handler Race Condition Fix (v19.5.0)
### Issue
Initial data chunks were being emitted before async socket handlers had completed setup, causing data loss when handlers performed async operations before setting up data listeners.
### Root Cause
The `handleSocketHandlerAction` method was using `process.nextTick` to emit initial chunks regardless of whether the handler was sync or async. This created a race condition where async handlers might not have their listeners ready when the initial data was emitted.
### Solution
Differentiated between sync and async handlers:
```typescript
const result = route.action.socketHandler(socket);
if (result instanceof Promise) {
// Async handler - wait for completion before emitting initial data
result.then(() => {
if (initialChunk && initialChunk.length > 0) {
socket.emit('data', initialChunk);
}
}).catch(/*...*/);
} else {
// Sync handler - use process.nextTick as before
if (initialChunk && initialChunk.length > 0) {
process.nextTick(() => {
socket.emit('data', initialChunk);
});
}
}
```
### Test Coverage
- `test/test.socket-handler-race.ts` - Specifically tests async handlers with delayed listener setup
- Verifies that initial data is received even when handler sets up listeners after async work
### Usage Note
Socket handlers require initial data from the client to trigger routing (not just a TLS handshake). Clients must send at least one byte of data for the handler to be invoked.
## Route-Specific Security Implementation (v19.5.3)
### Issue
Route-specific security configurations (ipAllowList, ipBlockList, authentication) were defined in the route types but not enforced at runtime.
### Root Cause
The RouteConnectionHandler only checked global IP validation but didn't enforce route-specific security rules after matching a route.
### Solution
Added security checks after route matching:
```typescript
// Apply route-specific security checks
const routeSecurity = route.action.security || route.security;
if (routeSecurity) {
// Check IP allow/block lists
if (routeSecurity.ipAllowList || routeSecurity.ipBlockList) {
const isIPAllowed = this.securityManager.isIPAuthorized(
remoteIP,
routeSecurity.ipAllowList || [],
routeSecurity.ipBlockList || []
);
if (!isIPAllowed) {
socket.end();
this.connectionManager.cleanupConnection(record, 'route_ip_blocked');
return;
}
}
}
```
### Test Coverage
- `test/test.route-security-unit.ts` - Unit tests verifying SecurityManager.isIPAuthorized logic
- Tests confirm IP allow/block lists work correctly with glob patterns
### Configuration Example
```typescript
const routes: IRouteConfig[] = [{
name: 'secure-api',
match: { ports: 8443, domains: 'api.example.com' },
action: {
type: 'forward',
target: { host: 'localhost', port: 3000 },
security: {
ipAllowList: ['192.168.1.*', '10.0.0.0/8'], // Allow internal IPs
ipBlockList: ['192.168.1.100'], // But block specific IP
maxConnections: 100, // Per-route limit (TODO)
authentication: { // HTTP-only, requires TLS termination
type: 'basic',
credentials: [{ username: 'api', password: 'secret' }]
}
}
}
}];
```
### Notes
- IP lists support glob patterns (via minimatch): `192.168.*`, `10.?.?.1`
- Block lists take precedence over allow lists
- Authentication requires TLS termination (cannot be enforced on passthrough/direct connections)
- Per-route connection limits are not yet implemented
- Security is defined at the route level (route.security), not in the action
- Route matching is based solely on match criteria; security is enforced after matching
## Performance Issues Investigation (v19.5.3+)
### Critical Blocking Operations Found
1. **Busy Wait Loop** in `ts/proxies/nftables-proxy/nftables-proxy.ts:235-238`
- Blocks entire event loop with `while (Date.now() < waitUntil) {}`
- Should use `await new Promise(resolve => setTimeout(resolve, delay))`
2. **Synchronous Filesystem Operations**
- Certificate management uses `fs.existsSync()`, `fs.mkdirSync()`, `fs.readFileSync()`
- NFTables proxy uses `execSync()` for system commands
- Certificate store uses `ensureDirSync()`, `fileExistsSync()`, `removeManySync()`
3. **Memory Leak Risks**
- Several `setInterval()` calls without storing references for cleanup
- Event listeners added without proper cleanup in error paths
- Missing `removeAllListeners()` calls in some connection cleanup scenarios
### Performance Recommendations
- Replace all sync filesystem operations with async alternatives
- Fix the busy wait loop immediately (critical event loop blocker)
- Add proper cleanup for all timers and event listeners
- Consider worker threads for CPU-intensive operations
- See `readme.problems.md` for detailed analysis and recommendations
## Performance Optimizations Implemented (Phase 1 - v19.6.0)
### 1. Async Utilities Created (`ts/core/utils/async-utils.ts`)
- **delay()**: Non-blocking alternative to busy wait loops
- **retryWithBackoff()**: Retry operations with exponential backoff
- **withTimeout()**: Execute operations with timeout protection
- **parallelLimit()**: Run async operations with concurrency control
- **debounceAsync()**: Debounce async functions
- **AsyncMutex**: Ensure exclusive access to resources
- **CircuitBreaker**: Protect against cascading failures
### 2. Filesystem Utilities Created (`ts/core/utils/fs-utils.ts`)
- **AsyncFileSystem**: Complete async filesystem operations
- exists(), ensureDir(), readFile(), writeFile()
- readJSON(), writeJSON() with proper error handling
- copyFile(), moveFile(), removeDir()
- Stream creation and file listing utilities
### 3. Critical Fixes Applied
#### Busy Wait Loop Fixed
- **Location**: `ts/proxies/nftables-proxy/nftables-proxy.ts:235-238`
- **Fix**: Replaced `while (Date.now() < waitUntil) {}` with `await delay(ms)`
- **Impact**: Unblocks event loop, massive performance improvement
#### Certificate Manager Migration
- **File**: `ts/proxies/http-proxy/certificate-manager.ts`
- Added async initialization method
- Kept sync methods for backward compatibility with deprecation warnings
- Added `loadDefaultCertificatesAsync()` method
#### Certificate Store Migration
- **File**: `ts/proxies/smart-proxy/cert-store.ts`
- Replaced all `fileExistsSync`, `ensureDirSync`, `removeManySync`
- Used parallel operations with `Promise.all()` for better performance
- Improved error handling and async JSON operations
#### NFTables Proxy Improvements
- Added deprecation warnings to sync methods
- Created `executeWithTempFile()` helper for common pattern
- Started migration of sync filesystem operations to async
- Added import for delay and AsyncFileSystem utilities
### 4. Backward Compatibility Maintained
- All sync methods retained with deprecation warnings
- Existing APIs unchanged, new async methods added alongside
- Feature flags prepared for gradual rollout
### 5. Phase 1 Completion Status
✅ **Phase 1 COMPLETE** - All critical performance fixes have been implemented:
- ✅ Fixed busy wait loop in nftables-proxy.ts
- ✅ Created async utilities (delay, retry, timeout, parallelLimit, mutex, circuit breaker)
- ✅ Created filesystem utilities (AsyncFileSystem with full async operations)
- ✅ Migrated all certificate management to async operations
- ✅ Migrated nftables-proxy filesystem operations to async (except stopSync for exit handlers)
- ✅ All tests passing for new utilities
### 6. Phase 2 Progress Status
🔨 **Phase 2 IN PROGRESS** - Resource Lifecycle Management:
- ✅ Created LifecycleComponent base class for automatic resource cleanup
- ✅ Created BinaryHeap data structure for priority queue operations
- ✅ Created EnhancedConnectionPool with backpressure and health checks
- ✅ Cleaned up legacy code (removed ts/common/, event-utils.ts, event-system.ts)
- 📋 TODO: Migrate existing components to extend LifecycleComponent
- 📋 TODO: Add integration tests for resource management
### 7. Next Steps (Remaining Work)
- **Phase 2 (cont)**: Migrate components to use LifecycleComponent
- **Phase 3**: Add worker threads for CPU-intensive operations
- **Phase 4**: Performance monitoring dashboard
## Socket Error Handling Fix (v19.5.11+)
### Issue
Server crashed with unhandled 'error' event when backend connections failed (ECONNREFUSED). Also caused memory leak with rising active connection count as failed connections weren't cleaned up properly.
### Root Cause
1. **Race Condition**: In forwarding handlers, sockets were created with `net.connect()` but error handlers were attached later, creating a window where errors could crash the server
2. **Incomplete Cleanup**: When server connections failed, client sockets weren't properly cleaned up, leaving connection records in memory
### Solution
Created `createSocketWithErrorHandler()` utility that attaches error handlers immediately:
```typescript
// Before (race condition):
const socket = net.connect(port, host);
// ... other code ...
socket.on('error', handler); // Too late!
// After (safe):
const socket = createSocketWithErrorHandler({
port, host,
onError: (error) => {
// Handle error immediately
clientSocket.destroy();
},
onConnect: () => {
// Set up forwarding
}
});
```
### Changes Made
1. **New Utility**: `ts/core/utils/socket-utils.ts` - Added `createSocketWithErrorHandler()`
2. **Updated Handlers**:
- `https-passthrough-handler.ts` - Uses safe socket creation
- `https-terminate-to-http-handler.ts` - Uses safe socket creation
3. **Connection Cleanup**: Client sockets destroyed immediately on server connection failure
### Test Coverage
- `test/test.socket-error-handling.node.ts` - Verifies server doesn't crash on ECONNREFUSED
- `test/test.forwarding-error-fix.node.ts` - Tests forwarding handlers handle errors gracefully
### Configuration
No configuration changes needed. The fix is transparent to users.