2025-06-22 22:28:37 +00:00
# SmartProxy Metrics Improvement Plan
## Overview
The current `getThroughputRate()` implementation calculates cumulative throughput over a 60-second window rather than providing an actual rate, making metrics misleading for monitoring systems. This plan outlines a comprehensive redesign of the metrics system to provide accurate, time-series based metrics suitable for production monitoring.
## 1. Core Issues with Current Implementation
- **Cumulative vs Rate**: Current method accumulates all bytes from connections in the last minute rather than calculating actual throughput rate
- **No Time-Series Data**: Cannot track throughput changes over time
- **Inaccurate Estimates**: Attempting to estimate rates for older connections is fundamentally flawed
- **No Sliding Windows**: Cannot provide different time window views (1s, 10s, 60s, etc.)
- **Limited Granularity**: Only provides a single 60-second view
## 2. Proposed Architecture
### A. Time-Series Throughput Tracking
```typescript
interface IThroughputSample {
timestamp: number;
bytesIn: number;
bytesOut: number;
}
class ThroughputTracker {
private samples: IThroughputSample[] = [];
private readonly MAX_SAMPLES = 3600; // 1 hour at 1 sample/second
private lastSampleTime: number = 0;
private accumulatedBytesIn: number = 0;
private accumulatedBytesOut: number = 0;
// Called on every data transfer
public recordBytes(bytesIn: number, bytesOut: number): void {
this.accumulatedBytesIn += bytesIn;
this.accumulatedBytesOut += bytesOut;
}
// Called periodically (every second)
public takeSample(): void {
const now = Date.now();
// Record accumulated bytes since last sample
this.samples.push({
timestamp: now,
bytesIn: this.accumulatedBytesIn,
bytesOut: this.accumulatedBytesOut
});
// Reset accumulators
this.accumulatedBytesIn = 0;
this.accumulatedBytesOut = 0;
// Trim old samples
const cutoff = now - 3600000; // 1 hour
this.samples = this.samples.filter(s => s.timestamp > cutoff);
}
// Get rate over specified window
public getRate(windowSeconds: number): { bytesInPerSec: number; bytesOutPerSec: number } {
const now = Date.now();
const windowStart = now - (windowSeconds * 1000);
const relevantSamples = this.samples.filter(s => s.timestamp > windowStart);
if (relevantSamples.length === 0) {
return { bytesInPerSec: 0, bytesOutPerSec: 0 };
}
const totalBytesIn = relevantSamples.reduce((sum, s) => sum + s.bytesIn, 0);
const totalBytesOut = relevantSamples.reduce((sum, s) => sum + s.bytesOut, 0);
const actualWindow = (now - relevantSamples[0].timestamp) / 1000;
return {
bytesInPerSec: Math.round(totalBytesIn / actualWindow),
bytesOutPerSec: Math.round(totalBytesOut / actualWindow)
};
}
}
```
### B. Connection-Level Byte Tracking
```typescript
// In ConnectionRecord, add:
interface IConnectionRecord {
// ... existing fields ...
// Byte counters with timestamps
bytesReceivedHistory: Array< { timestamp: number; bytes: number }>;
bytesSentHistory: Array< { timestamp: number; bytes: number }>;
// For efficiency, could use circular buffer
lastBytesReceivedUpdate: number;
lastBytesSentUpdate: number;
}
```
### C. Enhanced Metrics Interface
```typescript
interface IMetrics {
// Connection metrics
connections: {
active(): number;
total(): number;
byRoute(): Map< string , number > ;
byIP(): Map< string , number > ;
topIPs(limit?: number): Array< { ip: string; count: number }>;
};
// Throughput metrics (bytes per second)
throughput: {
instant(): { in: number; out: number }; // Last 1 second
recent(): { in: number; out: number }; // Last 10 seconds
average(): { in: number; out: number }; // Last 60 seconds
custom(seconds: number): { in: number; out: number };
history(seconds: number): Array< { timestamp: number; in: number; out: number }>;
byRoute(windowSeconds?: number): Map< string , { in: number ; out: number } > ;
byIP(windowSeconds?: number): Map< string , { in: number ; out: number } > ;
};
// Request metrics
requests: {
perSecond(): number;
perMinute(): number;
total(): number;
};
// Cumulative totals
totals: {
bytesIn(): number;
bytesOut(): number;
connections(): number;
};
// Performance metrics
percentiles: {
connectionDuration(): { p50: number; p95: number; p99: number };
bytesTransferred(): {
in: { p50: number; p95: number; p99: number };
out: { p50: number; p95: number; p99: number };
};
};
}
```
## 3. Implementation Plan
### Current Status
- **Phase 1**: ~90% complete (core functionality implemented, tests need fixing)
- **Phase 2**: ~60% complete (main features done, percentiles pending)
- **Phase 3**: ~40% complete (basic optimizations in place)
- **Phase 4**: 0% complete (export formats not started)
### Phase 1: Core Throughput Tracking (Week 1)
- [x] Implement `ThroughputTracker` class
- [x] Integrate byte recording into socket data handlers
- [x] Add periodic sampling (1-second intervals)
- [x] Update `getThroughputRate()` to use time-series data (replaced with new clean API)
- [ ] Add unit tests for throughput tracking
### Phase 2: Enhanced Metrics (Week 2)
- [x] Add configurable time windows (1s, 10s, 60s, 5m, etc.)
- [ ] Implement percentile calculations
- [x] Add route-specific and IP-specific throughput tracking
- [x] Create historical data access methods
- [ ] Add integration tests
### Phase 3: Performance Optimization (Week 3)
- [x] Use circular buffers for efficiency
- [ ] Implement data aggregation for longer time windows
- [x] Add configurable retention periods
- [ ] Optimize memory usage
- [ ] Add performance benchmarks
### Phase 4: Export Formats (Week 4)
- [ ] Add Prometheus metric format with proper metric types
- [ ] Add StatsD format support
- [ ] Add JSON export with metadata
- [ ] Create OpenMetrics compatibility
- [ ] Add documentation and examples
## 4. Key Design Decisions
### A. Sampling Strategy
- **1-second samples** for fine-grained data
- **Aggregate to 1-minute** for longer retention
- **Keep 1 hour** of second-level data
- **Keep 24 hours** of minute-level data
### B. Memory Management
- **Circular buffers** for fixed memory usage
- **Configurable retention** periods
- **Lazy aggregation** for older data
- **Efficient data structures** (typed arrays for samples)
### C. Performance Considerations
- **Batch updates** during high throughput
- **Debounced calculations** for expensive metrics
- **Cached results** with TTL
- **Worker thread** option for heavy calculations
## 5. Configuration Options
```typescript
interface IMetricsConfig {
enabled: boolean;
// Sampling configuration
sampleIntervalMs: number; // Default: 1000 (1 second)
retentionSeconds: number; // Default: 3600 (1 hour)
// Performance tuning
enableDetailedTracking: boolean; // Per-connection byte history
enablePercentiles: boolean; // Calculate percentiles
cacheResultsMs: number; // Cache expensive calculations
// Export configuration
prometheusEnabled: boolean;
prometheusPath: string; // Default: /metrics
prometheusPrefix: string; // Default: smartproxy_
}
```
## 6. Example Usage
```typescript
const proxy = new SmartProxy({
metrics: {
enabled: true,
sampleIntervalMs: 1000,
enableDetailedTracking: true
}
});
// Get metrics instance
const metrics = proxy.getMetrics();
// Connection metrics
console.log(`Active connections: ${metrics.connections.active()}` );
console.log(`Total connections: ${metrics.connections.total()}` );
// Throughput metrics
const instant = metrics.throughput.instant();
console.log(`Current: ${instant.in} bytes/sec in, ${instant.out} bytes/sec out` );
const recent = metrics.throughput.recent(); // Last 10 seconds
const average = metrics.throughput.average(); // Last 60 seconds
// Custom time window
const custom = metrics.throughput.custom(30); // Last 30 seconds
// Historical data for graphing
const history = metrics.throughput.history(300); // Last 5 minutes
history.forEach(point => {
console.log(`${new Date(point.timestamp)}: ${point.in} bytes/sec in, ${point.out} bytes/sec out` );
});
// Top routes by throughput
const routeThroughput = metrics.throughput.byRoute(60);
routeThroughput.forEach((stats, route) => {
console.log(`Route ${route}: ${stats.in} bytes/sec in, ${stats.out} bytes/sec out` );
});
// Request metrics
console.log(`RPS: ${metrics.requests.perSecond()}` );
console.log(`RPM: ${metrics.requests.perMinute()}` );
// Totals
console.log(`Total bytes in: ${metrics.totals.bytesIn()}` );
console.log(`Total bytes out: ${metrics.totals.bytesOut()}` );
```
## 7. Prometheus Export Example
```
# HELP smartproxy_throughput_bytes_per_second Current throughput in bytes per second
# TYPE smartproxy_throughput_bytes_per_second gauge
smartproxy_throughput_bytes_per_second{direction="in",window="1s"} 1234567
smartproxy_throughput_bytes_per_second{direction="out",window="1s"} 987654
smartproxy_throughput_bytes_per_second{direction="in",window="10s"} 1134567
smartproxy_throughput_bytes_per_second{direction="out",window="10s"} 887654
# HELP smartproxy_bytes_total Total bytes transferred
# TYPE smartproxy_bytes_total counter
smartproxy_bytes_total{direction="in"} 123456789
smartproxy_bytes_total{direction="out"} 98765432
# HELP smartproxy_active_connections Current number of active connections
# TYPE smartproxy_active_connections gauge
smartproxy_active_connections 42
# HELP smartproxy_connection_duration_seconds Connection duration in seconds
# TYPE smartproxy_connection_duration_seconds histogram
smartproxy_connection_duration_seconds_bucket{le="0.1"} 100
smartproxy_connection_duration_seconds_bucket{le="1"} 500
smartproxy_connection_duration_seconds_bucket{le="10"} 800
smartproxy_connection_duration_seconds_bucket{le="+Inf"} 850
smartproxy_connection_duration_seconds_sum 4250
smartproxy_connection_duration_seconds_count 850
```
## 8. Migration Strategy
### Breaking Changes
- Completely replace the old metrics API with the new clean design
- Remove all `get*` prefixed methods in favor of grouped properties
- Use simple `{ in, out }` objects instead of verbose property names
- Provide clear migration guide in documentation
### Implementation Approach
2025-06-22 23:10:56 +00:00
1. ✅ Create new `ThroughputTracker` class for time-series data
2. ✅ Implement new `IMetrics` interface with clean API
3. ✅ Replace `MetricsCollector` implementation entirely
4. ✅ Update all references to use new API
5. ⚠️ Add comprehensive tests for accuracy validation (partial)
### Additional Refactoring Completed
- Refactored all SmartProxy components to use cleaner dependency pattern
- Components now receive only `SmartProxy` instance instead of individual dependencies
- Access to other components via `this.smartProxy.componentName`
- Significantly simplified constructor signatures across the codebase
2025-06-22 22:28:37 +00:00
## 9. Success Metrics
- **Accuracy**: Throughput metrics accurate within 1% of actual
- **Performance**: < 1 % CPU overhead for metrics collection
- **Memory**: < 10MB memory usage for 1 hour of data
- **Latency**: < 1ms to retrieve any metric
- **Reliability**: No metrics data loss under load
## 10. Future Enhancements
### Phase 5: Advanced Analytics
- Anomaly detection for traffic patterns
- Predictive analytics for capacity planning
- Correlation analysis between routes
- Real-time alerting integration
### Phase 6: Distributed Metrics
- Metrics aggregation across multiple proxies
- Distributed time-series storage
- Cross-proxy analytics
- Global dashboard support
## 11. Risks and Mitigations
### Risk: Memory Usage
- **Mitigation**: Circular buffers and configurable retention
- **Monitoring**: Track memory usage per metric type
### Risk: Performance Impact
- **Mitigation**: Efficient data structures and caching
- **Testing**: Load test with metrics enabled/disabled
### Risk: Data Accuracy
- **Mitigation**: Atomic operations and proper synchronization
- **Validation**: Compare with external monitoring tools
## Conclusion
This plan transforms SmartProxy's metrics from a basic cumulative system to a comprehensive, time-series based monitoring solution suitable for production environments. The phased approach ensures minimal disruption while delivering immediate value through accurate throughput measurements.