12 KiB
12 KiB
SmartProxy Metrics Improvement Plan
Overview
The current getThroughputRate()
implementation calculates cumulative throughput over a 60-second window rather than providing an actual rate, making metrics misleading for monitoring systems. This plan outlines a comprehensive redesign of the metrics system to provide accurate, time-series based metrics suitable for production monitoring.
1. Core Issues with Current Implementation
- Cumulative vs Rate: Current method accumulates all bytes from connections in the last minute rather than calculating actual throughput rate
- No Time-Series Data: Cannot track throughput changes over time
- Inaccurate Estimates: Attempting to estimate rates for older connections is fundamentally flawed
- No Sliding Windows: Cannot provide different time window views (1s, 10s, 60s, etc.)
- Limited Granularity: Only provides a single 60-second view
2. Proposed Architecture
A. Time-Series Throughput Tracking
interface IThroughputSample {
timestamp: number;
bytesIn: number;
bytesOut: number;
}
class ThroughputTracker {
private samples: IThroughputSample[] = [];
private readonly MAX_SAMPLES = 3600; // 1 hour at 1 sample/second
private lastSampleTime: number = 0;
private accumulatedBytesIn: number = 0;
private accumulatedBytesOut: number = 0;
// Called on every data transfer
public recordBytes(bytesIn: number, bytesOut: number): void {
this.accumulatedBytesIn += bytesIn;
this.accumulatedBytesOut += bytesOut;
}
// Called periodically (every second)
public takeSample(): void {
const now = Date.now();
// Record accumulated bytes since last sample
this.samples.push({
timestamp: now,
bytesIn: this.accumulatedBytesIn,
bytesOut: this.accumulatedBytesOut
});
// Reset accumulators
this.accumulatedBytesIn = 0;
this.accumulatedBytesOut = 0;
// Trim old samples
const cutoff = now - 3600000; // 1 hour
this.samples = this.samples.filter(s => s.timestamp > cutoff);
}
// Get rate over specified window
public getRate(windowSeconds: number): { bytesInPerSec: number; bytesOutPerSec: number } {
const now = Date.now();
const windowStart = now - (windowSeconds * 1000);
const relevantSamples = this.samples.filter(s => s.timestamp > windowStart);
if (relevantSamples.length === 0) {
return { bytesInPerSec: 0, bytesOutPerSec: 0 };
}
const totalBytesIn = relevantSamples.reduce((sum, s) => sum + s.bytesIn, 0);
const totalBytesOut = relevantSamples.reduce((sum, s) => sum + s.bytesOut, 0);
const actualWindow = (now - relevantSamples[0].timestamp) / 1000;
return {
bytesInPerSec: Math.round(totalBytesIn / actualWindow),
bytesOutPerSec: Math.round(totalBytesOut / actualWindow)
};
}
}
B. Connection-Level Byte Tracking
// In ConnectionRecord, add:
interface IConnectionRecord {
// ... existing fields ...
// Byte counters with timestamps
bytesReceivedHistory: Array<{ timestamp: number; bytes: number }>;
bytesSentHistory: Array<{ timestamp: number; bytes: number }>;
// For efficiency, could use circular buffer
lastBytesReceivedUpdate: number;
lastBytesSentUpdate: number;
}
C. Enhanced Metrics Interface
interface IMetrics {
// Connection metrics
connections: {
active(): number;
total(): number;
byRoute(): Map<string, number>;
byIP(): Map<string, number>;
topIPs(limit?: number): Array<{ ip: string; count: number }>;
};
// Throughput metrics (bytes per second)
throughput: {
instant(): { in: number; out: number }; // Last 1 second
recent(): { in: number; out: number }; // Last 10 seconds
average(): { in: number; out: number }; // Last 60 seconds
custom(seconds: number): { in: number; out: number };
history(seconds: number): Array<{ timestamp: number; in: number; out: number }>;
byRoute(windowSeconds?: number): Map<string, { in: number; out: number }>;
byIP(windowSeconds?: number): Map<string, { in: number; out: number }>;
};
// Request metrics
requests: {
perSecond(): number;
perMinute(): number;
total(): number;
};
// Cumulative totals
totals: {
bytesIn(): number;
bytesOut(): number;
connections(): number;
};
// Performance metrics
percentiles: {
connectionDuration(): { p50: number; p95: number; p99: number };
bytesTransferred(): {
in: { p50: number; p95: number; p99: number };
out: { p50: number; p95: number; p99: number };
};
};
}
3. Implementation Plan
Current Status
- Phase 1: ~90% complete (core functionality implemented, tests need fixing)
- Phase 2: ~60% complete (main features done, percentiles pending)
- Phase 3: ~40% complete (basic optimizations in place)
- Phase 4: 0% complete (export formats not started)
Phase 1: Core Throughput Tracking (Week 1)
- Implement
ThroughputTracker
class - Integrate byte recording into socket data handlers
- Add periodic sampling (1-second intervals)
- Update
getThroughputRate()
to use time-series data (replaced with new clean API) - Add unit tests for throughput tracking
Phase 2: Enhanced Metrics (Week 2)
- Add configurable time windows (1s, 10s, 60s, 5m, etc.)
- Implement percentile calculations
- Add route-specific and IP-specific throughput tracking
- Create historical data access methods
- Add integration tests
Phase 3: Performance Optimization (Week 3)
- Use circular buffers for efficiency
- Implement data aggregation for longer time windows
- Add configurable retention periods
- Optimize memory usage
- Add performance benchmarks
Phase 4: Export Formats (Week 4)
- Add Prometheus metric format with proper metric types
- Add StatsD format support
- Add JSON export with metadata
- Create OpenMetrics compatibility
- Add documentation and examples
4. Key Design Decisions
A. Sampling Strategy
- 1-second samples for fine-grained data
- Aggregate to 1-minute for longer retention
- Keep 1 hour of second-level data
- Keep 24 hours of minute-level data
B. Memory Management
- Circular buffers for fixed memory usage
- Configurable retention periods
- Lazy aggregation for older data
- Efficient data structures (typed arrays for samples)
C. Performance Considerations
- Batch updates during high throughput
- Debounced calculations for expensive metrics
- Cached results with TTL
- Worker thread option for heavy calculations
5. Configuration Options
interface IMetricsConfig {
enabled: boolean;
// Sampling configuration
sampleIntervalMs: number; // Default: 1000 (1 second)
retentionSeconds: number; // Default: 3600 (1 hour)
// Performance tuning
enableDetailedTracking: boolean; // Per-connection byte history
enablePercentiles: boolean; // Calculate percentiles
cacheResultsMs: number; // Cache expensive calculations
// Export configuration
prometheusEnabled: boolean;
prometheusPath: string; // Default: /metrics
prometheusPrefix: string; // Default: smartproxy_
}
6. Example Usage
const proxy = new SmartProxy({
metrics: {
enabled: true,
sampleIntervalMs: 1000,
enableDetailedTracking: true
}
});
// Get metrics instance
const metrics = proxy.getMetrics();
// Connection metrics
console.log(`Active connections: ${metrics.connections.active()}`);
console.log(`Total connections: ${metrics.connections.total()}`);
// Throughput metrics
const instant = metrics.throughput.instant();
console.log(`Current: ${instant.in} bytes/sec in, ${instant.out} bytes/sec out`);
const recent = metrics.throughput.recent(); // Last 10 seconds
const average = metrics.throughput.average(); // Last 60 seconds
// Custom time window
const custom = metrics.throughput.custom(30); // Last 30 seconds
// Historical data for graphing
const history = metrics.throughput.history(300); // Last 5 minutes
history.forEach(point => {
console.log(`${new Date(point.timestamp)}: ${point.in} bytes/sec in, ${point.out} bytes/sec out`);
});
// Top routes by throughput
const routeThroughput = metrics.throughput.byRoute(60);
routeThroughput.forEach((stats, route) => {
console.log(`Route ${route}: ${stats.in} bytes/sec in, ${stats.out} bytes/sec out`);
});
// Request metrics
console.log(`RPS: ${metrics.requests.perSecond()}`);
console.log(`RPM: ${metrics.requests.perMinute()}`);
// Totals
console.log(`Total bytes in: ${metrics.totals.bytesIn()}`);
console.log(`Total bytes out: ${metrics.totals.bytesOut()}`);
7. Prometheus Export Example
# HELP smartproxy_throughput_bytes_per_second Current throughput in bytes per second
# TYPE smartproxy_throughput_bytes_per_second gauge
smartproxy_throughput_bytes_per_second{direction="in",window="1s"} 1234567
smartproxy_throughput_bytes_per_second{direction="out",window="1s"} 987654
smartproxy_throughput_bytes_per_second{direction="in",window="10s"} 1134567
smartproxy_throughput_bytes_per_second{direction="out",window="10s"} 887654
# HELP smartproxy_bytes_total Total bytes transferred
# TYPE smartproxy_bytes_total counter
smartproxy_bytes_total{direction="in"} 123456789
smartproxy_bytes_total{direction="out"} 98765432
# HELP smartproxy_active_connections Current number of active connections
# TYPE smartproxy_active_connections gauge
smartproxy_active_connections 42
# HELP smartproxy_connection_duration_seconds Connection duration in seconds
# TYPE smartproxy_connection_duration_seconds histogram
smartproxy_connection_duration_seconds_bucket{le="0.1"} 100
smartproxy_connection_duration_seconds_bucket{le="1"} 500
smartproxy_connection_duration_seconds_bucket{le="10"} 800
smartproxy_connection_duration_seconds_bucket{le="+Inf"} 850
smartproxy_connection_duration_seconds_sum 4250
smartproxy_connection_duration_seconds_count 850
8. Migration Strategy
Breaking Changes
- Completely replace the old metrics API with the new clean design
- Remove all
get*
prefixed methods in favor of grouped properties - Use simple
{ in, out }
objects instead of verbose property names - Provide clear migration guide in documentation
Implementation Approach
- ✅ Create new
ThroughputTracker
class for time-series data - ✅ Implement new
IMetrics
interface with clean API - ✅ Replace
MetricsCollector
implementation entirely - ✅ Update all references to use new API
- ⚠️ Add comprehensive tests for accuracy validation (partial)
Additional Refactoring Completed
- Refactored all SmartProxy components to use cleaner dependency pattern
- Components now receive only
SmartProxy
instance instead of individual dependencies - Access to other components via
this.smartProxy.componentName
- Significantly simplified constructor signatures across the codebase
9. Success Metrics
- Accuracy: Throughput metrics accurate within 1% of actual
- Performance: < 1% CPU overhead for metrics collection
- Memory: < 10MB memory usage for 1 hour of data
- Latency: < 1ms to retrieve any metric
- Reliability: No metrics data loss under load
10. Future Enhancements
Phase 5: Advanced Analytics
- Anomaly detection for traffic patterns
- Predictive analytics for capacity planning
- Correlation analysis between routes
- Real-time alerting integration
Phase 6: Distributed Metrics
- Metrics aggregation across multiple proxies
- Distributed time-series storage
- Cross-proxy analytics
- Global dashboard support
11. Risks and Mitigations
Risk: Memory Usage
- Mitigation: Circular buffers and configurable retention
- Monitoring: Track memory usage per metric type
Risk: Performance Impact
- Mitigation: Efficient data structures and caching
- Testing: Load test with metrics enabled/disabled
Risk: Data Accuracy
- Mitigation: Atomic operations and proper synchronization
- Validation: Compare with external monitoring tools
Conclusion
This plan transforms SmartProxy's metrics from a basic cumulative system to a comprehensive, time-series based monitoring solution suitable for production environments. The phased approach ensures minimal disruption while delivering immediate value through accurate throughput measurements.