Files
smartproxy/readme.plan.md
2025-06-22 22:28:37 +00:00

12 KiB

SmartProxy Metrics Improvement Plan

Overview

The current getThroughputRate() implementation calculates cumulative throughput over a 60-second window rather than providing an actual rate, making metrics misleading for monitoring systems. This plan outlines a comprehensive redesign of the metrics system to provide accurate, time-series based metrics suitable for production monitoring.

1. Core Issues with Current Implementation

  • Cumulative vs Rate: Current method accumulates all bytes from connections in the last minute rather than calculating actual throughput rate
  • No Time-Series Data: Cannot track throughput changes over time
  • Inaccurate Estimates: Attempting to estimate rates for older connections is fundamentally flawed
  • No Sliding Windows: Cannot provide different time window views (1s, 10s, 60s, etc.)
  • Limited Granularity: Only provides a single 60-second view

2. Proposed Architecture

A. Time-Series Throughput Tracking

interface IThroughputSample {
  timestamp: number;
  bytesIn: number;
  bytesOut: number;
}

class ThroughputTracker {
  private samples: IThroughputSample[] = [];
  private readonly MAX_SAMPLES = 3600; // 1 hour at 1 sample/second
  private lastSampleTime: number = 0;
  private accumulatedBytesIn: number = 0;
  private accumulatedBytesOut: number = 0;
  
  // Called on every data transfer
  public recordBytes(bytesIn: number, bytesOut: number): void {
    this.accumulatedBytesIn += bytesIn;
    this.accumulatedBytesOut += bytesOut;
  }
  
  // Called periodically (every second)
  public takeSample(): void {
    const now = Date.now();
    
    // Record accumulated bytes since last sample
    this.samples.push({
      timestamp: now,
      bytesIn: this.accumulatedBytesIn,
      bytesOut: this.accumulatedBytesOut
    });
    
    // Reset accumulators
    this.accumulatedBytesIn = 0;
    this.accumulatedBytesOut = 0;
    
    // Trim old samples
    const cutoff = now - 3600000; // 1 hour
    this.samples = this.samples.filter(s => s.timestamp > cutoff);
  }
  
  // Get rate over specified window
  public getRate(windowSeconds: number): { bytesInPerSec: number; bytesOutPerSec: number } {
    const now = Date.now();
    const windowStart = now - (windowSeconds * 1000);
    
    const relevantSamples = this.samples.filter(s => s.timestamp > windowStart);
    
    if (relevantSamples.length === 0) {
      return { bytesInPerSec: 0, bytesOutPerSec: 0 };
    }
    
    const totalBytesIn = relevantSamples.reduce((sum, s) => sum + s.bytesIn, 0);
    const totalBytesOut = relevantSamples.reduce((sum, s) => sum + s.bytesOut, 0);
    
    const actualWindow = (now - relevantSamples[0].timestamp) / 1000;
    
    return {
      bytesInPerSec: Math.round(totalBytesIn / actualWindow),
      bytesOutPerSec: Math.round(totalBytesOut / actualWindow)
    };
  }
}

B. Connection-Level Byte Tracking

// In ConnectionRecord, add:
interface IConnectionRecord {
  // ... existing fields ...
  
  // Byte counters with timestamps
  bytesReceivedHistory: Array<{ timestamp: number; bytes: number }>;
  bytesSentHistory: Array<{ timestamp: number; bytes: number }>;
  
  // For efficiency, could use circular buffer
  lastBytesReceivedUpdate: number;
  lastBytesSentUpdate: number;
}

C. Enhanced Metrics Interface

interface IMetrics {
  // Connection metrics
  connections: {
    active(): number;
    total(): number;
    byRoute(): Map<string, number>;
    byIP(): Map<string, number>;
    topIPs(limit?: number): Array<{ ip: string; count: number }>;
  };
  
  // Throughput metrics (bytes per second)
  throughput: {
    instant(): { in: number; out: number };      // Last 1 second
    recent(): { in: number; out: number };       // Last 10 seconds  
    average(): { in: number; out: number };      // Last 60 seconds
    custom(seconds: number): { in: number; out: number };
    history(seconds: number): Array<{ timestamp: number; in: number; out: number }>;
    byRoute(windowSeconds?: number): Map<string, { in: number; out: number }>;
    byIP(windowSeconds?: number): Map<string, { in: number; out: number }>;
  };
  
  // Request metrics
  requests: {
    perSecond(): number;
    perMinute(): number;
    total(): number;
  };
  
  // Cumulative totals
  totals: {
    bytesIn(): number;
    bytesOut(): number;
    connections(): number;
  };
  
  // Performance metrics
  percentiles: {
    connectionDuration(): { p50: number; p95: number; p99: number };
    bytesTransferred(): { 
      in: { p50: number; p95: number; p99: number };
      out: { p50: number; p95: number; p99: number };
    };
  };
}

3. Implementation Plan

Current Status

  • Phase 1: ~90% complete (core functionality implemented, tests need fixing)
  • Phase 2: ~60% complete (main features done, percentiles pending)
  • Phase 3: ~40% complete (basic optimizations in place)
  • Phase 4: 0% complete (export formats not started)

Phase 1: Core Throughput Tracking (Week 1)

  • Implement ThroughputTracker class
  • Integrate byte recording into socket data handlers
  • Add periodic sampling (1-second intervals)
  • Update getThroughputRate() to use time-series data (replaced with new clean API)
  • Add unit tests for throughput tracking

Phase 2: Enhanced Metrics (Week 2)

  • Add configurable time windows (1s, 10s, 60s, 5m, etc.)
  • Implement percentile calculations
  • Add route-specific and IP-specific throughput tracking
  • Create historical data access methods
  • Add integration tests

Phase 3: Performance Optimization (Week 3)

  • Use circular buffers for efficiency
  • Implement data aggregation for longer time windows
  • Add configurable retention periods
  • Optimize memory usage
  • Add performance benchmarks

Phase 4: Export Formats (Week 4)

  • Add Prometheus metric format with proper metric types
  • Add StatsD format support
  • Add JSON export with metadata
  • Create OpenMetrics compatibility
  • Add documentation and examples

4. Key Design Decisions

A. Sampling Strategy

  • 1-second samples for fine-grained data
  • Aggregate to 1-minute for longer retention
  • Keep 1 hour of second-level data
  • Keep 24 hours of minute-level data

B. Memory Management

  • Circular buffers for fixed memory usage
  • Configurable retention periods
  • Lazy aggregation for older data
  • Efficient data structures (typed arrays for samples)

C. Performance Considerations

  • Batch updates during high throughput
  • Debounced calculations for expensive metrics
  • Cached results with TTL
  • Worker thread option for heavy calculations

5. Configuration Options

interface IMetricsConfig {
  enabled: boolean;
  
  // Sampling configuration
  sampleIntervalMs: number;        // Default: 1000 (1 second)
  retentionSeconds: number;        // Default: 3600 (1 hour)
  
  // Performance tuning
  enableDetailedTracking: boolean; // Per-connection byte history
  enablePercentiles: boolean;      // Calculate percentiles
  cacheResultsMs: number;         // Cache expensive calculations
  
  // Export configuration
  prometheusEnabled: boolean;
  prometheusPath: string;         // Default: /metrics
  prometheusPrefix: string;       // Default: smartproxy_
}

6. Example Usage

const proxy = new SmartProxy({
  metrics: {
    enabled: true,
    sampleIntervalMs: 1000,
    enableDetailedTracking: true
  }
});

// Get metrics instance
const metrics = proxy.getMetrics();

// Connection metrics
console.log(`Active connections: ${metrics.connections.active()}`);
console.log(`Total connections: ${metrics.connections.total()}`);

// Throughput metrics
const instant = metrics.throughput.instant();
console.log(`Current: ${instant.in} bytes/sec in, ${instant.out} bytes/sec out`);

const recent = metrics.throughput.recent();   // Last 10 seconds
const average = metrics.throughput.average(); // Last 60 seconds

// Custom time window
const custom = metrics.throughput.custom(30); // Last 30 seconds

// Historical data for graphing
const history = metrics.throughput.history(300); // Last 5 minutes
history.forEach(point => {
  console.log(`${new Date(point.timestamp)}: ${point.in} bytes/sec in, ${point.out} bytes/sec out`);
});

// Top routes by throughput
const routeThroughput = metrics.throughput.byRoute(60);
routeThroughput.forEach((stats, route) => {
  console.log(`Route ${route}: ${stats.in} bytes/sec in, ${stats.out} bytes/sec out`);
});

// Request metrics
console.log(`RPS: ${metrics.requests.perSecond()}`);
console.log(`RPM: ${metrics.requests.perMinute()}`);

// Totals
console.log(`Total bytes in: ${metrics.totals.bytesIn()}`);
console.log(`Total bytes out: ${metrics.totals.bytesOut()}`);

7. Prometheus Export Example

# HELP smartproxy_throughput_bytes_per_second Current throughput in bytes per second
# TYPE smartproxy_throughput_bytes_per_second gauge
smartproxy_throughput_bytes_per_second{direction="in",window="1s"} 1234567
smartproxy_throughput_bytes_per_second{direction="out",window="1s"} 987654
smartproxy_throughput_bytes_per_second{direction="in",window="10s"} 1134567
smartproxy_throughput_bytes_per_second{direction="out",window="10s"} 887654

# HELP smartproxy_bytes_total Total bytes transferred
# TYPE smartproxy_bytes_total counter
smartproxy_bytes_total{direction="in"} 123456789
smartproxy_bytes_total{direction="out"} 98765432

# HELP smartproxy_active_connections Current number of active connections
# TYPE smartproxy_active_connections gauge
smartproxy_active_connections 42

# HELP smartproxy_connection_duration_seconds Connection duration in seconds
# TYPE smartproxy_connection_duration_seconds histogram
smartproxy_connection_duration_seconds_bucket{le="0.1"} 100
smartproxy_connection_duration_seconds_bucket{le="1"} 500
smartproxy_connection_duration_seconds_bucket{le="10"} 800
smartproxy_connection_duration_seconds_bucket{le="+Inf"} 850
smartproxy_connection_duration_seconds_sum 4250
smartproxy_connection_duration_seconds_count 850

8. Migration Strategy

Breaking Changes

  • Completely replace the old metrics API with the new clean design
  • Remove all get* prefixed methods in favor of grouped properties
  • Use simple { in, out } objects instead of verbose property names
  • Provide clear migration guide in documentation

Implementation Approach

  1. Create new ThroughputTracker class for time-series data
  2. Implement new IMetrics interface with clean API
  3. Replace MetricsCollector implementation entirely
  4. Update all references to use new API
  5. Add comprehensive tests for accuracy validation

9. Success Metrics

  • Accuracy: Throughput metrics accurate within 1% of actual
  • Performance: < 1% CPU overhead for metrics collection
  • Memory: < 10MB memory usage for 1 hour of data
  • Latency: < 1ms to retrieve any metric
  • Reliability: No metrics data loss under load

10. Future Enhancements

Phase 5: Advanced Analytics

  • Anomaly detection for traffic patterns
  • Predictive analytics for capacity planning
  • Correlation analysis between routes
  • Real-time alerting integration

Phase 6: Distributed Metrics

  • Metrics aggregation across multiple proxies
  • Distributed time-series storage
  • Cross-proxy analytics
  • Global dashboard support

11. Risks and Mitigations

Risk: Memory Usage

  • Mitigation: Circular buffers and configurable retention
  • Monitoring: Track memory usage per metric type

Risk: Performance Impact

  • Mitigation: Efficient data structures and caching
  • Testing: Load test with metrics enabled/disabled

Risk: Data Accuracy

  • Mitigation: Atomic operations and proper synchronization
  • Validation: Compare with external monitoring tools

Conclusion

This plan transforms SmartProxy's metrics from a basic cumulative system to a comprehensive, time-series based monitoring solution suitable for production environments. The phased approach ensures minimal disruption while delivering immediate value through accurate throughput measurements.