Files
smartproxy/readme.plan.md
2025-06-22 23:10:56 +00:00

12 KiB

SmartProxy Metrics Improvement Plan

Overview

The current getThroughputRate() implementation calculates cumulative throughput over a 60-second window rather than providing an actual rate, making metrics misleading for monitoring systems. This plan outlines a comprehensive redesign of the metrics system to provide accurate, time-series based metrics suitable for production monitoring.

1. Core Issues with Current Implementation

  • Cumulative vs Rate: Current method accumulates all bytes from connections in the last minute rather than calculating actual throughput rate
  • No Time-Series Data: Cannot track throughput changes over time
  • Inaccurate Estimates: Attempting to estimate rates for older connections is fundamentally flawed
  • No Sliding Windows: Cannot provide different time window views (1s, 10s, 60s, etc.)
  • Limited Granularity: Only provides a single 60-second view

2. Proposed Architecture

A. Time-Series Throughput Tracking

interface IThroughputSample {
  timestamp: number;
  bytesIn: number;
  bytesOut: number;
}

class ThroughputTracker {
  private samples: IThroughputSample[] = [];
  private readonly MAX_SAMPLES = 3600; // 1 hour at 1 sample/second
  private lastSampleTime: number = 0;
  private accumulatedBytesIn: number = 0;
  private accumulatedBytesOut: number = 0;
  
  // Called on every data transfer
  public recordBytes(bytesIn: number, bytesOut: number): void {
    this.accumulatedBytesIn += bytesIn;
    this.accumulatedBytesOut += bytesOut;
  }
  
  // Called periodically (every second)
  public takeSample(): void {
    const now = Date.now();
    
    // Record accumulated bytes since last sample
    this.samples.push({
      timestamp: now,
      bytesIn: this.accumulatedBytesIn,
      bytesOut: this.accumulatedBytesOut
    });
    
    // Reset accumulators
    this.accumulatedBytesIn = 0;
    this.accumulatedBytesOut = 0;
    
    // Trim old samples
    const cutoff = now - 3600000; // 1 hour
    this.samples = this.samples.filter(s => s.timestamp > cutoff);
  }
  
  // Get rate over specified window
  public getRate(windowSeconds: number): { bytesInPerSec: number; bytesOutPerSec: number } {
    const now = Date.now();
    const windowStart = now - (windowSeconds * 1000);
    
    const relevantSamples = this.samples.filter(s => s.timestamp > windowStart);
    
    if (relevantSamples.length === 0) {
      return { bytesInPerSec: 0, bytesOutPerSec: 0 };
    }
    
    const totalBytesIn = relevantSamples.reduce((sum, s) => sum + s.bytesIn, 0);
    const totalBytesOut = relevantSamples.reduce((sum, s) => sum + s.bytesOut, 0);
    
    const actualWindow = (now - relevantSamples[0].timestamp) / 1000;
    
    return {
      bytesInPerSec: Math.round(totalBytesIn / actualWindow),
      bytesOutPerSec: Math.round(totalBytesOut / actualWindow)
    };
  }
}

B. Connection-Level Byte Tracking

// In ConnectionRecord, add:
interface IConnectionRecord {
  // ... existing fields ...
  
  // Byte counters with timestamps
  bytesReceivedHistory: Array<{ timestamp: number; bytes: number }>;
  bytesSentHistory: Array<{ timestamp: number; bytes: number }>;
  
  // For efficiency, could use circular buffer
  lastBytesReceivedUpdate: number;
  lastBytesSentUpdate: number;
}

C. Enhanced Metrics Interface

interface IMetrics {
  // Connection metrics
  connections: {
    active(): number;
    total(): number;
    byRoute(): Map<string, number>;
    byIP(): Map<string, number>;
    topIPs(limit?: number): Array<{ ip: string; count: number }>;
  };
  
  // Throughput metrics (bytes per second)
  throughput: {
    instant(): { in: number; out: number };      // Last 1 second
    recent(): { in: number; out: number };       // Last 10 seconds  
    average(): { in: number; out: number };      // Last 60 seconds
    custom(seconds: number): { in: number; out: number };
    history(seconds: number): Array<{ timestamp: number; in: number; out: number }>;
    byRoute(windowSeconds?: number): Map<string, { in: number; out: number }>;
    byIP(windowSeconds?: number): Map<string, { in: number; out: number }>;
  };
  
  // Request metrics
  requests: {
    perSecond(): number;
    perMinute(): number;
    total(): number;
  };
  
  // Cumulative totals
  totals: {
    bytesIn(): number;
    bytesOut(): number;
    connections(): number;
  };
  
  // Performance metrics
  percentiles: {
    connectionDuration(): { p50: number; p95: number; p99: number };
    bytesTransferred(): { 
      in: { p50: number; p95: number; p99: number };
      out: { p50: number; p95: number; p99: number };
    };
  };
}

3. Implementation Plan

Current Status

  • Phase 1: ~90% complete (core functionality implemented, tests need fixing)
  • Phase 2: ~60% complete (main features done, percentiles pending)
  • Phase 3: ~40% complete (basic optimizations in place)
  • Phase 4: 0% complete (export formats not started)

Phase 1: Core Throughput Tracking (Week 1)

  • Implement ThroughputTracker class
  • Integrate byte recording into socket data handlers
  • Add periodic sampling (1-second intervals)
  • Update getThroughputRate() to use time-series data (replaced with new clean API)
  • Add unit tests for throughput tracking

Phase 2: Enhanced Metrics (Week 2)

  • Add configurable time windows (1s, 10s, 60s, 5m, etc.)
  • Implement percentile calculations
  • Add route-specific and IP-specific throughput tracking
  • Create historical data access methods
  • Add integration tests

Phase 3: Performance Optimization (Week 3)

  • Use circular buffers for efficiency
  • Implement data aggregation for longer time windows
  • Add configurable retention periods
  • Optimize memory usage
  • Add performance benchmarks

Phase 4: Export Formats (Week 4)

  • Add Prometheus metric format with proper metric types
  • Add StatsD format support
  • Add JSON export with metadata
  • Create OpenMetrics compatibility
  • Add documentation and examples

4. Key Design Decisions

A. Sampling Strategy

  • 1-second samples for fine-grained data
  • Aggregate to 1-minute for longer retention
  • Keep 1 hour of second-level data
  • Keep 24 hours of minute-level data

B. Memory Management

  • Circular buffers for fixed memory usage
  • Configurable retention periods
  • Lazy aggregation for older data
  • Efficient data structures (typed arrays for samples)

C. Performance Considerations

  • Batch updates during high throughput
  • Debounced calculations for expensive metrics
  • Cached results with TTL
  • Worker thread option for heavy calculations

5. Configuration Options

interface IMetricsConfig {
  enabled: boolean;
  
  // Sampling configuration
  sampleIntervalMs: number;        // Default: 1000 (1 second)
  retentionSeconds: number;        // Default: 3600 (1 hour)
  
  // Performance tuning
  enableDetailedTracking: boolean; // Per-connection byte history
  enablePercentiles: boolean;      // Calculate percentiles
  cacheResultsMs: number;         // Cache expensive calculations
  
  // Export configuration
  prometheusEnabled: boolean;
  prometheusPath: string;         // Default: /metrics
  prometheusPrefix: string;       // Default: smartproxy_
}

6. Example Usage

const proxy = new SmartProxy({
  metrics: {
    enabled: true,
    sampleIntervalMs: 1000,
    enableDetailedTracking: true
  }
});

// Get metrics instance
const metrics = proxy.getMetrics();

// Connection metrics
console.log(`Active connections: ${metrics.connections.active()}`);
console.log(`Total connections: ${metrics.connections.total()}`);

// Throughput metrics
const instant = metrics.throughput.instant();
console.log(`Current: ${instant.in} bytes/sec in, ${instant.out} bytes/sec out`);

const recent = metrics.throughput.recent();   // Last 10 seconds
const average = metrics.throughput.average(); // Last 60 seconds

// Custom time window
const custom = metrics.throughput.custom(30); // Last 30 seconds

// Historical data for graphing
const history = metrics.throughput.history(300); // Last 5 minutes
history.forEach(point => {
  console.log(`${new Date(point.timestamp)}: ${point.in} bytes/sec in, ${point.out} bytes/sec out`);
});

// Top routes by throughput
const routeThroughput = metrics.throughput.byRoute(60);
routeThroughput.forEach((stats, route) => {
  console.log(`Route ${route}: ${stats.in} bytes/sec in, ${stats.out} bytes/sec out`);
});

// Request metrics
console.log(`RPS: ${metrics.requests.perSecond()}`);
console.log(`RPM: ${metrics.requests.perMinute()}`);

// Totals
console.log(`Total bytes in: ${metrics.totals.bytesIn()}`);
console.log(`Total bytes out: ${metrics.totals.bytesOut()}`);

7. Prometheus Export Example

# HELP smartproxy_throughput_bytes_per_second Current throughput in bytes per second
# TYPE smartproxy_throughput_bytes_per_second gauge
smartproxy_throughput_bytes_per_second{direction="in",window="1s"} 1234567
smartproxy_throughput_bytes_per_second{direction="out",window="1s"} 987654
smartproxy_throughput_bytes_per_second{direction="in",window="10s"} 1134567
smartproxy_throughput_bytes_per_second{direction="out",window="10s"} 887654

# HELP smartproxy_bytes_total Total bytes transferred
# TYPE smartproxy_bytes_total counter
smartproxy_bytes_total{direction="in"} 123456789
smartproxy_bytes_total{direction="out"} 98765432

# HELP smartproxy_active_connections Current number of active connections
# TYPE smartproxy_active_connections gauge
smartproxy_active_connections 42

# HELP smartproxy_connection_duration_seconds Connection duration in seconds
# TYPE smartproxy_connection_duration_seconds histogram
smartproxy_connection_duration_seconds_bucket{le="0.1"} 100
smartproxy_connection_duration_seconds_bucket{le="1"} 500
smartproxy_connection_duration_seconds_bucket{le="10"} 800
smartproxy_connection_duration_seconds_bucket{le="+Inf"} 850
smartproxy_connection_duration_seconds_sum 4250
smartproxy_connection_duration_seconds_count 850

8. Migration Strategy

Breaking Changes

  • Completely replace the old metrics API with the new clean design
  • Remove all get* prefixed methods in favor of grouped properties
  • Use simple { in, out } objects instead of verbose property names
  • Provide clear migration guide in documentation

Implementation Approach

  1. Create new ThroughputTracker class for time-series data
  2. Implement new IMetrics interface with clean API
  3. Replace MetricsCollector implementation entirely
  4. Update all references to use new API
  5. ⚠️ Add comprehensive tests for accuracy validation (partial)

Additional Refactoring Completed

  • Refactored all SmartProxy components to use cleaner dependency pattern
  • Components now receive only SmartProxy instance instead of individual dependencies
  • Access to other components via this.smartProxy.componentName
  • Significantly simplified constructor signatures across the codebase

9. Success Metrics

  • Accuracy: Throughput metrics accurate within 1% of actual
  • Performance: < 1% CPU overhead for metrics collection
  • Memory: < 10MB memory usage for 1 hour of data
  • Latency: < 1ms to retrieve any metric
  • Reliability: No metrics data loss under load

10. Future Enhancements

Phase 5: Advanced Analytics

  • Anomaly detection for traffic patterns
  • Predictive analytics for capacity planning
  • Correlation analysis between routes
  • Real-time alerting integration

Phase 6: Distributed Metrics

  • Metrics aggregation across multiple proxies
  • Distributed time-series storage
  • Cross-proxy analytics
  • Global dashboard support

11. Risks and Mitigations

Risk: Memory Usage

  • Mitigation: Circular buffers and configurable retention
  • Monitoring: Track memory usage per metric type

Risk: Performance Impact

  • Mitigation: Efficient data structures and caching
  • Testing: Load test with metrics enabled/disabled

Risk: Data Accuracy

  • Mitigation: Atomic operations and proper synchronization
  • Validation: Compare with external monitoring tools

Conclusion

This plan transforms SmartProxy's metrics from a basic cumulative system to a comprehensive, time-series based monitoring solution suitable for production environments. The phased approach ensures minimal disruption while delivering immediate value through accurate throughput measurements.