v25.11.23

fix(rustproxy-http,rustproxy-metrics): reduce per-frame metrics overhead by batching body byte accounting
v25.11.22
2026-03-17 12:22:51 +00:00 · 2026-03-17 12:22:51 +00:00 · 2026-03-17 12:12:24 +00:00 · 2026-03-17 12:12:24 +00:00 · 2026-03-17 11:33:34 +00:00 · 2026-03-17 11:33:34 +00:00
6 changed files with 125 additions and 84 deletions
--- a/changelog.md
+++ b/changelog.md
@@ -1,5 +1,30 @@
 # Changelog

+## 2026-03-17 - 25.11.23 - fix(rustproxy-http,rustproxy-metrics)
+reduce per-frame metrics overhead by batching body byte accounting
+
+- Buffer HTTP body byte counts and flush them every 64 KB, at end of stream, and on drop to keep totals accurate while preserving throughput sampling.
+- Skip zero-value counter updates in metrics collection to avoid unnecessary atomic and DashMap operations for the unused direction.
+
+## 2026-03-17 - 25.11.22 - fix(rustproxy-http)
+reuse healthy HTTP/2 upstream connections after requests with bodies
+
+- Registers successful HTTP/2 connections in the pool regardless of whether the proxied request included a body
+- Continues to avoid pooling upstream connections that returned 502 Bad Gateway responses
+
+## 2026-03-17 - 25.11.21 - fix(rustproxy-http)
+reuse pooled HTTP/2 connections for requests with and without bodies
+
+- remove the bodyless-request restriction from HTTP/2 pool checkout
+- always return successful HTTP/2 senders to the connection pool after requests
+
+## 2026-03-17 - 25.11.20 - fix(rustproxy-http)
+avoid downgrading cached backend protocol on H2 stream errors
+
+- Treat HTTP/2 stream-level failures as retryable request errors instead of evidence that the backend only supports HTTP/1.1
+- Keep protocol cache entries unchanged after successful H2 handshakes so future requests continue using HTTP/2
+- Lower log severity for this fallback path from warning to debug while still recording backend H2 failure metrics
+
 ## 2026-03-16 - 25.11.19 - fix(rustproxy-http)
 avoid reusing pooled HTTP/2 connections for requests with bodies to prevent upload flow-control stalls

--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
  "name": "@push.rocks/smartproxy",
-  "version": "25.11.19",
+  "version": "25.11.23",
  "private": false,
  "description": "A powerful proxy package with unified route-based configuration for high traffic management. Features include SSL/TLS support, flexible routing patterns, WebSocket handling, advanced security options, and automatic ACME certificate management.",
  "main": "dist_ts/index.js",
--- a/rust/crates/rustproxy-http/src/counting_body.rs
+++ b/rust/crates/rustproxy-http/src/counting_body.rs
@@ -9,10 +9,17 @@ use bytes::Bytes;
 use http_body::Frame;
 use rustproxy_metrics::MetricsCollector;

+/// Flush accumulated bytes to the metrics collector every 64 KB.
+/// This reduces per-frame DashMap shard-locked reads from ~15 to ~1 per 4 frames
+/// (assuming typical 16 KB upload frames).  The 1 Hz throughput sampler still sees
+/// data within one sampling period even at low transfer rates.
+const BYTE_FLUSH_THRESHOLD: u64 = 65_536;
+
 /// Wraps any `http_body::Body` and counts data bytes passing through.
 ///
-/// Each chunk is reported to the `MetricsCollector` immediately so that
-/// the throughput tracker (sampled at 1 Hz) reflects real-time data flow.
+/// Bytes are accumulated and flushed to the `MetricsCollector` every
+/// [`BYTE_FLUSH_THRESHOLD`] bytes (and on Drop) so the throughput tracker
+/// (sampled at 1 Hz) reflects real-time data flow without per-frame overhead.
 ///
 /// The inner body is pinned on the heap to support `!Unpin` types like `hyper::body::Incoming`.
 pub struct CountingBody<B> {
@@ -22,6 +29,8 @@ pub struct CountingBody<B> {
    source_ip: Option<String>,
    /// Whether we count bytes as "in" (request body) or "out" (response body).
    direction: Direction,
+    /// Accumulated bytes not yet flushed to the metrics collector.
+    pending_bytes: u64,
    /// Optional connection-level activity tracker. When set, poll_frame updates this
    /// to keep the idle watchdog alive during active body streaming (uploads/downloads).
    connection_activity: Option<Arc<AtomicU64>>,
@@ -57,6 +66,7 @@ impl<B> CountingBody<B> {
            route_id,
            source_ip,
            direction,
+            pending_bytes: 0,
            connection_activity: None,
            activity_start: None,
            active_requests: None,
@@ -81,14 +91,19 @@ impl<B> CountingBody<B> {
        self
    }

-    /// Report a chunk of bytes immediately to the metrics collector.
+    /// Flush accumulated bytes to the metrics collector.
    #[inline]
-    fn report_chunk(&self, len: u64) {
+    fn flush_pending(&mut self) {
+        if self.pending_bytes == 0 {
+            return;
+        }
+        let bytes = self.pending_bytes;
+        self.pending_bytes = 0;
        let route_id = self.route_id.as_deref();
        let source_ip = self.source_ip.as_deref();
        match self.direction {
-            Direction::In => self.metrics.record_bytes(len, 0, route_id, source_ip),
-            Direction::Out => self.metrics.record_bytes(0, len, route_id, source_ip),
+            Direction::In => self.metrics.record_bytes(bytes, 0, route_id, source_ip),
+            Direction::Out => self.metrics.record_bytes(0, bytes, route_id, source_ip),
        }
    }
 }
@@ -113,9 +128,12 @@ where
            Poll::Ready(Some(Ok(frame))) => {
                if let Some(data) = frame.data_ref() {
                    let len = data.len() as u64;
-                    // Report bytes immediately so the 1 Hz throughput sampler sees them
-                    this.report_chunk(len);
-                    // Keep the connection-level idle watchdog alive during body streaming
+                    this.pending_bytes += len;
+                    if this.pending_bytes >= BYTE_FLUSH_THRESHOLD {
+                        this.flush_pending();
+                    }
+                    // Keep the connection-level idle watchdog alive on every frame
+                    // (this is just one atomic store — cheap enough per-frame)
                    if let (Some(activity), Some(start)) = (&this.connection_activity, &this.activity_start) {
                        activity.store(start.elapsed().as_millis() as u64, Ordering::Relaxed);
                    }
@@ -123,7 +141,11 @@ where
                Poll::Ready(Some(Ok(frame)))
            }
            Poll::Ready(Some(Err(e))) => Poll::Ready(Some(Err(e))),
-            Poll::Ready(None) => Poll::Ready(None),
+            Poll::Ready(None) => {
+                // End of stream — flush any remaining bytes
+                this.flush_pending();
+                Poll::Ready(None)
+            }
            Poll::Pending => Poll::Pending,
        }
    }
@@ -139,6 +161,8 @@ where

 impl<B> Drop for CountingBody<B> {
    fn drop(&mut self) {
+        // Flush any remaining accumulated bytes so totals stay accurate
+        self.flush_pending();
        // Decrement the active-request counter so the HTTP idle watchdog
        // knows this response body is no longer streaming.
        if let Some(ref counter) = self.active_requests {
--- a/rust/crates/rustproxy-http/src/proxy_service.rs
+++ b/rust/crates/rustproxy-http/src/proxy_service.rs
@@ -677,20 +677,10 @@ impl HttpProxyService {
                h2: use_h2,
            };

-            // H2 pool checkout — only for bodyless requests (GET/HEAD/DELETE).
-            //
-            // WORKAROUND: Requests with bodies (POST/PUT uploads) always get fresh H2
-            // connections. Reusing a pooled H2 connection after a large upload can stall
-            // forever due to depleted connection-level flow control windows. The h2 crate
-            // has no stall/timeout detection (https://github.com/hyperium/hyper/issues/2899),
-            // and Go/nginx HTTP/2 servers have known issues with connection-level window
-            // replenishment after large transfers (https://github.com/golang/go/issues/16481,
-            // https://github.com/golang/go/issues/56558). A fresh connection guarantees
-            // clean flow control state. The overhead is ~3-5ms for TLS+H2 handshake.
-            //
-            // TODO: Revisit once h2 crate adds flow control stall detection, or once
-            // Go/nginx H2 connection-level window handling is confirmed reliable.
-            if use_h2 && body.is_end_stream() {
+            // H2 pool checkout — reuse pooled connections for all requests.
+            // The h2 crate properly replenishes connection-level flow control
+            // windows via release_capacity() as data is consumed.
+            if use_h2 {
                if let Some((mut sender, age)) = self.connection_pool.checkout_h2(&pool_key) {
                    match tokio::time::timeout(
                        std::time::Duration::from_millis(500),
@@ -1048,12 +1038,9 @@ impl HttpProxyService {
            });
        }

-        // Only pool the H2 connection if the request had no body.
-        // Requests with bodies (uploads) deplete connection-level flow control windows.
-        let request_had_body = !body.is_end_stream();
        let sender_for_pool = sender.clone();
        let result = self.forward_h2_with_sender(sender, parts, body, upstream_headers, upstream_path, route, route_id, source_ip, Some(pool_key), domain, conn_activity).await;
-        if !request_had_body && matches!(&result, Ok(ref resp) if resp.status() != StatusCode::BAD_GATEWAY) {
+        if matches!(&result, Ok(ref resp) if resp.status() != StatusCode::BAD_GATEWAY) {
            let g = self.connection_pool.register_h2(pool_key.clone(), sender_for_pool);
            gen_holder.store(g, std::sync::atomic::Ordering::Relaxed);
        }
@@ -1379,32 +1366,24 @@ impl HttpProxyService {

                match sender.send_request(upstream_req).await {
                    Ok(upstream_response) => {
-                        // Only pool after bodyless requests — uploads deplete connection-level
-                        // flow control windows (see comment at pool checkout above).
-                        if retry_state.is_some() {
-                            let g = self.connection_pool.register_h2(pool_key.clone(), sender);
-                            gen_holder.store(g, std::sync::atomic::Ordering::Relaxed);
-                        }
+                        let g = self.connection_pool.register_h2(pool_key.clone(), sender);
+                        gen_holder.store(g, std::sync::atomic::Ordering::Relaxed);
                        self.build_streaming_response(upstream_response, route, route_id, source_ip, conn_activity).await
                    }
                    Err(e) => {
-                        // H2 request failed — backend advertises h2 via ALPN but doesn't
-                        // actually speak it. Update cache so future requests use H1.
+                        // H2 request failed on a stream level (e.g. RST_STREAM PROTOCOL_ERROR).
+                        // The H2 handshake succeeded, so the backend genuinely speaks H2 — don't
+                        // poison the protocol cache.  Only handshake-level failures (below) should
+                        // downgrade the cache to H1.
                        let bk = format!("{}:{}", upstream.host, upstream.port);
-                        warn!(
+                        debug!(
                            backend = %bk,
                            domain = %domain,
                            error = %e,
                            error_debug = ?e,
-                            "Auto-detect: H2 request failed, falling back to H1"
+                            "H2 stream error, retrying this request as H1"
                        );
                        self.metrics.backend_h2_failure(&bk);
-                        let cache_key = crate::protocol_cache::ProtocolCacheKey {
-                            host: upstream.host.clone(),
-                            port: upstream.port,
-                            requested_host: requested_host.clone(),
-                        };
-                        self.protocol_cache.insert(cache_key, crate::protocol_cache::DetectedProtocol::H1);

                        // Retry as H1 for bodyless requests; return 502 for requests with bodies
                        if let Some((method, headers)) = retry_state {
--- a/rust/crates/rustproxy-metrics/src/collector.rs
+++ b/rust/crates/rustproxy-metrics/src/collector.rs
@@ -259,40 +259,49 @@ impl MetricsCollector {
    /// Called per-chunk in the TCP copy loop. Only touches AtomicU64 counters —
    /// no Mutex is taken. The throughput trackers are fed during `sample_all()`.
    pub fn record_bytes(&self, bytes_in: u64, bytes_out: u64, route_id: Option<&str>, source_ip: Option<&str>) {
-        self.total_bytes_in.fetch_add(bytes_in, Ordering::Relaxed);
-        self.total_bytes_out.fetch_add(bytes_out, Ordering::Relaxed);
-
-        // Accumulate into lock-free pending throughput counters
-        self.global_pending_tp_in.fetch_add(bytes_in, Ordering::Relaxed);
-        self.global_pending_tp_out.fetch_add(bytes_out, Ordering::Relaxed);
+        // Short-circuit: only touch counters for the direction that has data.
+        // CountingBody always calls with one direction zero — skipping the zero
+        // direction avoids ~50% of DashMap shard-locked reads per call.
+        if bytes_in > 0 {
+            self.total_bytes_in.fetch_add(bytes_in, Ordering::Relaxed);
+            self.global_pending_tp_in.fetch_add(bytes_in, Ordering::Relaxed);
+        }
+        if bytes_out > 0 {
+            self.total_bytes_out.fetch_add(bytes_out, Ordering::Relaxed);
+            self.global_pending_tp_out.fetch_add(bytes_out, Ordering::Relaxed);
+        }

        // Per-route tracking: use get() first (zero-alloc fast path for existing entries),
        // fall back to entry() with to_string() only on the rare first-chunk miss.
        if let Some(route_id) = route_id {
-            if let Some(counter) = self.route_bytes_in.get(route_id) {
-                counter.fetch_add(bytes_in, Ordering::Relaxed);
-            } else {
-                self.route_bytes_in.entry(route_id.to_string())
-                    .or_insert_with(|| AtomicU64::new(0))
-                    .fetch_add(bytes_in, Ordering::Relaxed);
+            if bytes_in > 0 {
+                if let Some(counter) = self.route_bytes_in.get(route_id) {
+                    counter.fetch_add(bytes_in, Ordering::Relaxed);
+                } else {
+                    self.route_bytes_in.entry(route_id.to_string())
+                        .or_insert_with(|| AtomicU64::new(0))
+                        .fetch_add(bytes_in, Ordering::Relaxed);
+                }
            }
-            if let Some(counter) = self.route_bytes_out.get(route_id) {
-                counter.fetch_add(bytes_out, Ordering::Relaxed);
-            } else {
-                self.route_bytes_out.entry(route_id.to_string())
-                    .or_insert_with(|| AtomicU64::new(0))
-                    .fetch_add(bytes_out, Ordering::Relaxed);
+            if bytes_out > 0 {
+                if let Some(counter) = self.route_bytes_out.get(route_id) {
+                    counter.fetch_add(bytes_out, Ordering::Relaxed);
+                } else {
+                    self.route_bytes_out.entry(route_id.to_string())
+                        .or_insert_with(|| AtomicU64::new(0))
+                        .fetch_add(bytes_out, Ordering::Relaxed);
+                }
            }

            // Accumulate into per-route pending throughput counters (lock-free)
            if let Some(entry) = self.route_pending_tp.get(route_id) {
-                entry.0.fetch_add(bytes_in, Ordering::Relaxed);
-                entry.1.fetch_add(bytes_out, Ordering::Relaxed);
+                if bytes_in > 0 { entry.0.fetch_add(bytes_in, Ordering::Relaxed); }
+                if bytes_out > 0 { entry.1.fetch_add(bytes_out, Ordering::Relaxed); }
            } else {
                let entry = self.route_pending_tp.entry(route_id.to_string())
                    .or_insert_with(|| (AtomicU64::new(0), AtomicU64::new(0)));
-                entry.0.fetch_add(bytes_in, Ordering::Relaxed);
-                entry.1.fetch_add(bytes_out, Ordering::Relaxed);
+                if bytes_in > 0 { entry.0.fetch_add(bytes_in, Ordering::Relaxed); }
+                if bytes_out > 0 { entry.1.fetch_add(bytes_out, Ordering::Relaxed); }
            }
        }

@@ -302,30 +311,34 @@ impl MetricsCollector {
            // This prevents orphaned entries when record_bytes races with
            // connection_closed (which evicts all per-IP data on last close).
            if self.ip_connections.contains_key(ip) {
-                if let Some(counter) = self.ip_bytes_in.get(ip) {
-                    counter.fetch_add(bytes_in, Ordering::Relaxed);
-                } else {
-                    self.ip_bytes_in.entry(ip.to_string())
-                        .or_insert_with(|| AtomicU64::new(0))
-                        .fetch_add(bytes_in, Ordering::Relaxed);
+                if bytes_in > 0 {
+                    if let Some(counter) = self.ip_bytes_in.get(ip) {
+                        counter.fetch_add(bytes_in, Ordering::Relaxed);
+                    } else {
+                        self.ip_bytes_in.entry(ip.to_string())
+                            .or_insert_with(|| AtomicU64::new(0))
+                            .fetch_add(bytes_in, Ordering::Relaxed);
+                    }
                }
-                if let Some(counter) = self.ip_bytes_out.get(ip) {
-                    counter.fetch_add(bytes_out, Ordering::Relaxed);
-                } else {
-                    self.ip_bytes_out.entry(ip.to_string())
-                        .or_insert_with(|| AtomicU64::new(0))
-                        .fetch_add(bytes_out, Ordering::Relaxed);
+                if bytes_out > 0 {
+                    if let Some(counter) = self.ip_bytes_out.get(ip) {
+                        counter.fetch_add(bytes_out, Ordering::Relaxed);
+                    } else {
+                        self.ip_bytes_out.entry(ip.to_string())
+                            .or_insert_with(|| AtomicU64::new(0))
+                            .fetch_add(bytes_out, Ordering::Relaxed);
+                    }
                }

                // Accumulate into per-IP pending throughput counters (lock-free)
                if let Some(entry) = self.ip_pending_tp.get(ip) {
-                    entry.0.fetch_add(bytes_in, Ordering::Relaxed);
-                    entry.1.fetch_add(bytes_out, Ordering::Relaxed);
+                    if bytes_in > 0 { entry.0.fetch_add(bytes_in, Ordering::Relaxed); }
+                    if bytes_out > 0 { entry.1.fetch_add(bytes_out, Ordering::Relaxed); }
                } else {
                    let entry = self.ip_pending_tp.entry(ip.to_string())
                        .or_insert_with(|| (AtomicU64::new(0), AtomicU64::new(0)));
-                    entry.0.fetch_add(bytes_in, Ordering::Relaxed);
-                    entry.1.fetch_add(bytes_out, Ordering::Relaxed);
+                    if bytes_in > 0 { entry.0.fetch_add(bytes_in, Ordering::Relaxed); }
+                    if bytes_out > 0 { entry.1.fetch_add(bytes_out, Ordering::Relaxed); }
                }
            }
        }
--- a/ts/00_commitinfo_data.ts
+++ b/ts/00_commitinfo_data.ts
@@ -3,6 +3,6 @@
 */
 export const commitinfo = {
  name: '@push.rocks/smartproxy',
-  version: '25.11.19',
+  version: '25.11.23',
  description: 'A powerful proxy package with unified route-based configuration for high traffic management. Features include SSL/TLS support, flexible routing patterns, WebSocket handling, advanced security options, and automatic ACME certificate management.'
 }
Author	SHA1	Message	Date
Juergen Kunz	5dccbbc9d1	v25.11.23 Some checks failed Default (tags) / security (push) Failing after 1s Details Default (tags) / test (push) Failing after 1s Details Default (tags) / release (push) Has been skipped Details Default (tags) / metadata (push) Has been skipped Details	2026-03-17 12:22:51 +00:00
Juergen Kunz	92d7113c6c	fix(rustproxy-http,rustproxy-metrics): reduce per-frame metrics overhead by batching body byte accounting	2026-03-17 12:22:51 +00:00
Juergen Kunz	8f6bb30367	v25.11.22 Some checks failed Default (tags) / security (push) Failing after 1s Details Default (tags) / test (push) Failing after 1s Details Default (tags) / release (push) Has been skipped Details Default (tags) / metadata (push) Has been skipped Details	2026-03-17 12:12:24 +00:00
Juergen Kunz	ef9bac80ff	fix(rustproxy-http): reuse healthy HTTP/2 upstream connections after requests with bodies	2026-03-17 12:12:24 +00:00
Juergen Kunz	9c78701038	v25.11.21 Some checks failed Default (tags) / security (push) Failing after 1s Details Default (tags) / test (push) Failing after 1s Details Default (tags) / release (push) Has been skipped Details Default (tags) / metadata (push) Has been skipped Details	2026-03-17 11:33:34 +00:00
Juergen Kunz	26fd9409a7	fix(rustproxy-http): reuse pooled HTTP/2 connections for requests with and without bodies	2026-03-17 11:33:34 +00:00
Juergen Kunz	cfff128499	v25.11.20 Some checks failed Default (tags) / security (push) Failing after 1s Details Default (tags) / test (push) Failing after 1s Details Default (tags) / release (push) Has been skipped Details Default (tags) / metadata (push) Has been skipped Details	2026-03-17 01:32:35 +00:00
Juergen Kunz	3baff354bd	fix(rustproxy-http): avoid downgrading cached backend protocol on H2 stream errors	2026-03-17 01:32:35 +00:00