# SmartProxy Metrics Improvement Plan ## Overview The current `getThroughputRate()` implementation calculates cumulative throughput over a 60-second window rather than providing an actual rate, making metrics misleading for monitoring systems. This plan outlines a comprehensive redesign of the metrics system to provide accurate, time-series based metrics suitable for production monitoring. ## 1. Core Issues with Current Implementation - **Cumulative vs Rate**: Current method accumulates all bytes from connections in the last minute rather than calculating actual throughput rate - **No Time-Series Data**: Cannot track throughput changes over time - **Inaccurate Estimates**: Attempting to estimate rates for older connections is fundamentally flawed - **No Sliding Windows**: Cannot provide different time window views (1s, 10s, 60s, etc.) - **Limited Granularity**: Only provides a single 60-second view ## 2. Proposed Architecture ### A. Time-Series Throughput Tracking ```typescript interface IThroughputSample { timestamp: number; bytesIn: number; bytesOut: number; } class ThroughputTracker { private samples: IThroughputSample[] = []; private readonly MAX_SAMPLES = 3600; // 1 hour at 1 sample/second private lastSampleTime: number = 0; private accumulatedBytesIn: number = 0; private accumulatedBytesOut: number = 0; // Called on every data transfer public recordBytes(bytesIn: number, bytesOut: number): void { this.accumulatedBytesIn += bytesIn; this.accumulatedBytesOut += bytesOut; } // Called periodically (every second) public takeSample(): void { const now = Date.now(); // Record accumulated bytes since last sample this.samples.push({ timestamp: now, bytesIn: this.accumulatedBytesIn, bytesOut: this.accumulatedBytesOut }); // Reset accumulators this.accumulatedBytesIn = 0; this.accumulatedBytesOut = 0; // Trim old samples const cutoff = now - 3600000; // 1 hour this.samples = this.samples.filter(s => s.timestamp > cutoff); } // Get rate over specified window public getRate(windowSeconds: number): { bytesInPerSec: number; bytesOutPerSec: number } { const now = Date.now(); const windowStart = now - (windowSeconds * 1000); const relevantSamples = this.samples.filter(s => s.timestamp > windowStart); if (relevantSamples.length === 0) { return { bytesInPerSec: 0, bytesOutPerSec: 0 }; } const totalBytesIn = relevantSamples.reduce((sum, s) => sum + s.bytesIn, 0); const totalBytesOut = relevantSamples.reduce((sum, s) => sum + s.bytesOut, 0); const actualWindow = (now - relevantSamples[0].timestamp) / 1000; return { bytesInPerSec: Math.round(totalBytesIn / actualWindow), bytesOutPerSec: Math.round(totalBytesOut / actualWindow) }; } } ``` ### B. Connection-Level Byte Tracking ```typescript // In ConnectionRecord, add: interface IConnectionRecord { // ... existing fields ... // Byte counters with timestamps bytesReceivedHistory: Array<{ timestamp: number; bytes: number }>; bytesSentHistory: Array<{ timestamp: number; bytes: number }>; // For efficiency, could use circular buffer lastBytesReceivedUpdate: number; lastBytesSentUpdate: number; } ``` ### C. Enhanced Metrics Interface ```typescript interface IMetrics { // Connection metrics connections: { active(): number; total(): number; byRoute(): Map; byIP(): Map; topIPs(limit?: number): Array<{ ip: string; count: number }>; }; // Throughput metrics (bytes per second) throughput: { instant(): { in: number; out: number }; // Last 1 second recent(): { in: number; out: number }; // Last 10 seconds average(): { in: number; out: number }; // Last 60 seconds custom(seconds: number): { in: number; out: number }; history(seconds: number): Array<{ timestamp: number; in: number; out: number }>; byRoute(windowSeconds?: number): Map; byIP(windowSeconds?: number): Map; }; // Request metrics requests: { perSecond(): number; perMinute(): number; total(): number; }; // Cumulative totals totals: { bytesIn(): number; bytesOut(): number; connections(): number; }; // Performance metrics percentiles: { connectionDuration(): { p50: number; p95: number; p99: number }; bytesTransferred(): { in: { p50: number; p95: number; p99: number }; out: { p50: number; p95: number; p99: number }; }; }; } ``` ## 3. Implementation Plan ### Current Status - **Phase 1**: ~90% complete (core functionality implemented, tests need fixing) - **Phase 2**: ~60% complete (main features done, percentiles pending) - **Phase 3**: ~40% complete (basic optimizations in place) - **Phase 4**: 0% complete (export formats not started) ### Phase 1: Core Throughput Tracking (Week 1) - [x] Implement `ThroughputTracker` class - [x] Integrate byte recording into socket data handlers - [x] Add periodic sampling (1-second intervals) - [x] Update `getThroughputRate()` to use time-series data (replaced with new clean API) - [ ] Add unit tests for throughput tracking ### Phase 2: Enhanced Metrics (Week 2) - [x] Add configurable time windows (1s, 10s, 60s, 5m, etc.) - [ ] Implement percentile calculations - [x] Add route-specific and IP-specific throughput tracking - [x] Create historical data access methods - [ ] Add integration tests ### Phase 3: Performance Optimization (Week 3) - [x] Use circular buffers for efficiency - [ ] Implement data aggregation for longer time windows - [x] Add configurable retention periods - [ ] Optimize memory usage - [ ] Add performance benchmarks ### Phase 4: Export Formats (Week 4) - [ ] Add Prometheus metric format with proper metric types - [ ] Add StatsD format support - [ ] Add JSON export with metadata - [ ] Create OpenMetrics compatibility - [ ] Add documentation and examples ## 4. Key Design Decisions ### A. Sampling Strategy - **1-second samples** for fine-grained data - **Aggregate to 1-minute** for longer retention - **Keep 1 hour** of second-level data - **Keep 24 hours** of minute-level data ### B. Memory Management - **Circular buffers** for fixed memory usage - **Configurable retention** periods - **Lazy aggregation** for older data - **Efficient data structures** (typed arrays for samples) ### C. Performance Considerations - **Batch updates** during high throughput - **Debounced calculations** for expensive metrics - **Cached results** with TTL - **Worker thread** option for heavy calculations ## 5. Configuration Options ```typescript interface IMetricsConfig { enabled: boolean; // Sampling configuration sampleIntervalMs: number; // Default: 1000 (1 second) retentionSeconds: number; // Default: 3600 (1 hour) // Performance tuning enableDetailedTracking: boolean; // Per-connection byte history enablePercentiles: boolean; // Calculate percentiles cacheResultsMs: number; // Cache expensive calculations // Export configuration prometheusEnabled: boolean; prometheusPath: string; // Default: /metrics prometheusPrefix: string; // Default: smartproxy_ } ``` ## 6. Example Usage ```typescript const proxy = new SmartProxy({ metrics: { enabled: true, sampleIntervalMs: 1000, enableDetailedTracking: true } }); // Get metrics instance const metrics = proxy.getMetrics(); // Connection metrics console.log(`Active connections: ${metrics.connections.active()}`); console.log(`Total connections: ${metrics.connections.total()}`); // Throughput metrics const instant = metrics.throughput.instant(); console.log(`Current: ${instant.in} bytes/sec in, ${instant.out} bytes/sec out`); const recent = metrics.throughput.recent(); // Last 10 seconds const average = metrics.throughput.average(); // Last 60 seconds // Custom time window const custom = metrics.throughput.custom(30); // Last 30 seconds // Historical data for graphing const history = metrics.throughput.history(300); // Last 5 minutes history.forEach(point => { console.log(`${new Date(point.timestamp)}: ${point.in} bytes/sec in, ${point.out} bytes/sec out`); }); // Top routes by throughput const routeThroughput = metrics.throughput.byRoute(60); routeThroughput.forEach((stats, route) => { console.log(`Route ${route}: ${stats.in} bytes/sec in, ${stats.out} bytes/sec out`); }); // Request metrics console.log(`RPS: ${metrics.requests.perSecond()}`); console.log(`RPM: ${metrics.requests.perMinute()}`); // Totals console.log(`Total bytes in: ${metrics.totals.bytesIn()}`); console.log(`Total bytes out: ${metrics.totals.bytesOut()}`); ``` ## 7. Prometheus Export Example ``` # HELP smartproxy_throughput_bytes_per_second Current throughput in bytes per second # TYPE smartproxy_throughput_bytes_per_second gauge smartproxy_throughput_bytes_per_second{direction="in",window="1s"} 1234567 smartproxy_throughput_bytes_per_second{direction="out",window="1s"} 987654 smartproxy_throughput_bytes_per_second{direction="in",window="10s"} 1134567 smartproxy_throughput_bytes_per_second{direction="out",window="10s"} 887654 # HELP smartproxy_bytes_total Total bytes transferred # TYPE smartproxy_bytes_total counter smartproxy_bytes_total{direction="in"} 123456789 smartproxy_bytes_total{direction="out"} 98765432 # HELP smartproxy_active_connections Current number of active connections # TYPE smartproxy_active_connections gauge smartproxy_active_connections 42 # HELP smartproxy_connection_duration_seconds Connection duration in seconds # TYPE smartproxy_connection_duration_seconds histogram smartproxy_connection_duration_seconds_bucket{le="0.1"} 100 smartproxy_connection_duration_seconds_bucket{le="1"} 500 smartproxy_connection_duration_seconds_bucket{le="10"} 800 smartproxy_connection_duration_seconds_bucket{le="+Inf"} 850 smartproxy_connection_duration_seconds_sum 4250 smartproxy_connection_duration_seconds_count 850 ``` ## 8. Migration Strategy ### Breaking Changes - Completely replace the old metrics API with the new clean design - Remove all `get*` prefixed methods in favor of grouped properties - Use simple `{ in, out }` objects instead of verbose property names - Provide clear migration guide in documentation ### Implementation Approach 1. ✅ Create new `ThroughputTracker` class for time-series data 2. ✅ Implement new `IMetrics` interface with clean API 3. ✅ Replace `MetricsCollector` implementation entirely 4. ✅ Update all references to use new API 5. ⚠️ Add comprehensive tests for accuracy validation (partial) ### Additional Refactoring Completed - Refactored all SmartProxy components to use cleaner dependency pattern - Components now receive only `SmartProxy` instance instead of individual dependencies - Access to other components via `this.smartProxy.componentName` - Significantly simplified constructor signatures across the codebase ## 9. Success Metrics - **Accuracy**: Throughput metrics accurate within 1% of actual - **Performance**: < 1% CPU overhead for metrics collection - **Memory**: < 10MB memory usage for 1 hour of data - **Latency**: < 1ms to retrieve any metric - **Reliability**: No metrics data loss under load ## 10. Future Enhancements ### Phase 5: Advanced Analytics - Anomaly detection for traffic patterns - Predictive analytics for capacity planning - Correlation analysis between routes - Real-time alerting integration ### Phase 6: Distributed Metrics - Metrics aggregation across multiple proxies - Distributed time-series storage - Cross-proxy analytics - Global dashboard support ## 11. Risks and Mitigations ### Risk: Memory Usage - **Mitigation**: Circular buffers and configurable retention - **Monitoring**: Track memory usage per metric type ### Risk: Performance Impact - **Mitigation**: Efficient data structures and caching - **Testing**: Load test with metrics enabled/disabled ### Risk: Data Accuracy - **Mitigation**: Atomic operations and proper synchronization - **Validation**: Compare with external monitoring tools ## Conclusion This plan transforms SmartProxy's metrics from a basic cumulative system to a comprehensive, time-series based monitoring solution suitable for production environments. The phased approach ensures minimal disruption while delivering immediate value through accurate throughput measurements.