v1.19.0

feat(proxy-engine,codec-lib): add adaptive RTP jitter buffering with Opus packet loss concealment and stable 20ms resampling
v1.18.0
2026-04-10 21:15:34 +00:00 · 2026-04-10 21:15:34 +00:00 · 2026-04-10 17:25:34 +00:00 · 2026-04-10 17:25:34 +00:00
9 changed files with 547 additions and 175 deletions
--- a/changelog.md
+++ b/changelog.md
@@ -1,5 +1,19 @@
 # Changelog

+## 2026-04-10 - 1.19.0 - feat(proxy-engine,codec-lib)
+add adaptive RTP jitter buffering with Opus packet loss concealment and stable 20ms resampling
+
+- introduces a per-leg adaptive jitter buffer in the mixer to reorder RTP packets, gate initial playout, and deliver one frame per 20ms tick
+- adds Opus PLC support to synthesize missing audio frames when packets are lost, with fade-based fallback handling for non-Opus codecs
+- updates i16 and f32 resamplers to use canonical 20ms chunks so cached resamplers preserve filter state and avoid variable-size cache thrashing
+
+## 2026-04-10 - 1.18.0 - feat(readme)
+expand documentation for voicemail, IVR, audio engine, and API capabilities
+
+- Updates the feature overview to document voicemail, IVR menus, call recording, enhanced TTS, and the 48kHz float audio engine
+- Refreshes the architecture section to describe the TypeScript control plane, Rust proxy-engine data plane, and JSON-over-stdio IPC
+- Clarifies REST API and WebSocket coverage with voicemail endpoints, incoming call events, and refined endpoint descriptions
+
 ## 2026-04-10 - 1.17.2 - fix(proxy-engine)
 use negotiated SDP payload types when wiring SIP legs and enable default nnnoiseless features for telephony denoising

--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
  "name": "siprouter",
-  "version": "1.17.2",
+  "version": "1.19.0",
  "private": true,
  "type": "module",
  "scripts": {
--- a/readme.md
+++ b/readme.md
@@ -1,6 +1,6 @@
 # @serve.zone/siprouter

-A production-grade **SIP B2BUA + WebRTC bridge** built with TypeScript and Rust. Routes calls between SIP providers, SIP hardware devices, and browser softphones — with real-time codec transcoding, ML noise suppression, neural TTS announcements, and a slick web dashboard.
+A production-grade **SIP B2BUA + WebRTC bridge** built with TypeScript and Rust. Routes calls between SIP providers, SIP hardware devices, and browser softphones — with real-time codec transcoding, ML noise suppression, neural TTS, voicemail, IVR menus, and a slick web dashboard.

 ## Issue Reporting and Security

@@ -12,14 +12,16 @@ For reporting bugs, issues, or security vulnerabilities, please visit [community

 siprouter sits between your SIP trunk providers and your endpoints — hardware phones, ATAs, browser softphones — and handles **everything** in between:

- 📞 **SIP B2BUA** — Terminates and re-originates calls with full RFC 3261 dialog state management
- 🌐 **WebRTC Bridge** — Browser-based softphone with bidirectional audio to the SIP network
- 🎛️ **Multi-Provider Trunking** — Register with multiple SIP providers simultaneously (sipgate, easybell, o2, etc.)
- 🔊 **Rust Codec Engine** — Real-time Opus ↔ G.722 ↔ PCMU ↔ PCMA transcoding in native Rust
- 🤖 **ML Noise Suppression** — RNNoise denoiser with per-direction state (to SIP / to browser)
- 🗣️ **Neural TTS** — Kokoro-powered "connecting your call" announcements, pre-encoded for instant playback
- 🔀 **Hub Model Calls** — N-leg calls with dynamic add/remove, transfer, and RTP fan-out
- 🖥️ **Web Dashboard** — Real-time SPA with live call monitoring, browser phone, contact management, provider config
+- 📞 **SIP B2BUA** — Terminates and re-originates calls with full RFC 3261 dialog state management, digest auth, and SDP negotiation
+- 🌐 **WebRTC Bridge** — Browser-based softphone with bidirectional Opus audio to the SIP network
+- 🎛️ **Multi-Provider Trunking** — Register with multiple SIP providers simultaneously (sipgate, easybell, etc.) with automatic failover
+- 🎧 **48kHz f32 Audio Engine** — High-fidelity internal audio bus at 48kHz/32-bit float with native Opus float encode/decode, FFT-based resampling, and per-leg ML noise suppression
+- 🔀 **N-Leg Mix-Minus Mixer** — Conference-grade mixing with dynamic leg add/remove, transfer, and per-source audio separation
+- 📧 **Voicemail** — Configurable voicemail boxes with TTS greetings, recording, and web playback
+- 🔢 **IVR Menus** — DTMF-navigable interactive voice response with nested menus, routing actions, and custom prompts
+- 🗣️ **Neural TTS** — Kokoro-powered announcements and greetings with 25+ voice presets, backed by espeak-ng fallback
+- 🎙️ **Call Recording** — Per-source separated WAV recording at 48kHz via tool legs
+- 🖥️ **Web Dashboard** — Real-time SPA with 9 views: live calls, browser phone, routing, voicemail, IVR, contacts, providers, and streaming logs

 ---

@@ -35,32 +37,38 @@ siprouter sits between your SIP trunk providers and your endpoints — hardware
 ┌──────────────────────────────────────┐
 │            siprouter                  │
 │                                      │
-│  ┌──────────┐  ┌──────────────────┐  │
-│  │ Call Hub  │  │  Rust Transcoder │  │
-│  │  N legs   │──│  Opus/G.722/PCM │  │
-│  │  fan-out  │  │  + RNNoise      │  │
-│  └────┬─────┘  └──────────────────┘  │
-│       │                              │
-│  ┌────┴─────┐  ┌──────────────────┐  │
-│  │ SIP Stack│  │  Kokoro TTS      │  │
-│  │ Dialog SM│  │  (ONNX Runtime)  │  │
-│  └────┬─────┘  └──────────────────┘  │
-│       │                              │
-│  ┌────┴──────────────────────────┐   │
-│  │   Local Registrar + Provider  │   │
-│  │   Registration Engine         │   │
-│  └───────────────────────────────┘   │
-└──────────┬──────────────┬────────────┘
-           │              │
-    ┌──────┴──────┐ ┌─────┴──────┐
-    │ SIP Devices │ │ SIP Trunk  │
-    │ (HT801, etc)│ │ Providers  │
-    └─────────────┘ └────────────┘
+│  TypeScript Control Plane            │
+│  ┌────────────────────────────────┐  │
+│  │ Config · WebRTC Signaling      │  │
+│  │ REST API · Web Dashboard       │  │
+│  │ Voicebox Manager · TTS Cache   │  │
+│  └────────────┬───────────────────┘  │
+│          JSON-over-stdio IPC         │
+│  ┌────────────┴───────────────────┐  │
+│  │ Rust proxy-engine (data plane) │  │
+│  │                                │  │
+│  │ SIP Stack · Dialog SM · Auth   │  │
+│  │ Call Manager · N-Leg Mixer     │  │
+│  │ 48kHz f32 Bus · RNNoise       │  │
+│  │ Codec Engine · RTP Port Pool   │  │
+│  │ WebRTC Engine · Kokoro TTS     │  │
+│  │ Voicemail · IVR · Recording    │  │
+│  └────┬──────────────────┬────────┘  │
+└───────┤──────────────────┤───────────┘
+        │                  │
+ ┌──────┴──────┐    ┌──────┴──────┐
+ │ SIP Devices │    │ SIP Trunk   │
+ │ (HT801 etc) │    │ Providers   │
+ └─────────────┘    └─────────────┘
 ```

-### The Hub Model
+### 🧠 Key Design Decisions

-Every call is a **hub** with N legs. Each leg is either a `SipLeg` (hardware device or provider) or a `WebRtcLeg` (browser). RTP flows through the hub — each leg's received audio is forwarded to all other legs, with codec transcoding handled transparently by the Rust engine.
+- **Hub Model** — Every call is a hub with N legs. Each leg is a `SipLeg` (device/provider) or `WebRtcLeg` (browser). Legs can be dynamically added, removed, or transferred without tearing down the call.
+- **Rust Data Plane** — All SIP protocol handling, codec transcoding, mixing, and RTP I/O runs in native Rust for real-time performance. TypeScript handles config, signaling, REST API, and dashboard.
+- **48kHz f32 Internal Bus** — Audio is processed at maximum quality internally. Encoding/decoding to wire format (G.722, PCMU, Opus) happens solely at the leg boundary.
+- **Per-Session Codec Isolation** — Each call leg gets its own encoder/decoder/resampler/denoiser state — no cross-call corruption.
+- **SDP Codec Negotiation** — Outbound encoding uses the codec actually negotiated in SDP answers, not just the first offered codec.

 ---

@@ -70,15 +78,16 @@ Every call is a **hub** with N legs. Each leg is either a `SipLeg` (hardware dev

 - **Node.js** ≥ 20 with `tsx` globally available
 - **pnpm** for package management
- **Rust** toolchain (for building the codec engine and TTS)
+- **Rust** toolchain (for building the proxy engine)
+- **espeak-ng** (optional, for TTS fallback)

 ### Install & Build

 ```bash
-# Clone and install
+# Clone and install dependencies
 pnpm install

-# Build the Rust binaries (opus-codec + tts-engine)
+# Build the Rust proxy-engine binary
 pnpm run buildRust

 # Bundle the web frontend
@@ -87,57 +96,92 @@ pnpm run bundle

 ### Configuration

-Create `.nogit/config.json` with your setup:
+Create `.nogit/config.json`:

 ```jsonc
 {
  "proxy": {
-    "lanIp": "192.168.1.100",     // Your server's LAN IP
-    "lanPort": 5070,               // SIP signaling port
-    "rtpPortRange": [20000, 20200],// RTP relay port pool (even ports)
-    "webUiPort": 3060              // Dashboard port
+    "lanIp": "192.168.1.100",          // Your server's LAN IP
+    "lanPort": 5070,                    // SIP signaling port
+    "publicIpSeed": "stun.example.com", // STUN server for public IP discovery
+    "rtpPortRange": { "min": 20000, "max": 20200 }, // RTP port pool (even ports)
+    "webUiPort": 3060                   // Dashboard + REST API port
  },
  "providers": [
    {
      "id": "my-trunk",
-      "name": "My SIP Provider",
-      "host": "sip.provider.com",
-      "port": 5060,
+      "displayName": "My SIP Provider",
+      "domain": "sip.provider.com",
+      "outboundProxy": { "address": "sip.provider.com", "port": 5060 },
      "username": "user",
      "password": "pass",
-      "codecs": ["G.722", "PCMA", "PCMU"],
-      "registerExpiry": 3600
+      "codecs": [9, 0, 8, 101],        // G.722, PCMU, PCMA, telephone-event
+      "registerIntervalSec": 300
    }
  ],
  "devices": [
    {
      "id": "desk-phone",
-      "name": "Desk Phone",
-      "type": "sip"
+      "displayName": "Desk Phone",
+      "expectedAddress": "192.168.1.50",
+      "extension": "100"
    }
  ],
  "routing": {
-    "inbound": {
-      "default": { "target": "all-devices", "ringBrowser": true }
+    "routes": [
+      {
+        "id": "inbound-default",
+        "name": "Ring all devices",
+        "priority": 100,
+        "direction": "inbound",
+        "match": {},
+        "action": {
+          "targets": ["desk-phone"],
+          "ringBrowsers": true,
+          "voicemailBox": "main",
+          "noAnswerTimeout": 25
+        }
+      },
+      {
+        "id": "outbound-default",
+        "name": "Route via trunk",
+        "priority": 100,
+        "direction": "outbound",
+        "match": {},
+        "action": { "provider": "my-trunk" }
+      }
+    ]
+  },
+  "voiceboxes": [
+    {
+      "id": "main",
+      "enabled": true,
+      "greetingText": "Please leave a message after the beep.",
+      "greetingVoice": "af_bella",
+      "noAnswerTimeoutSec": 25,
+      "maxRecordingSec": 120,
+      "maxMessages": 50
    }
-  }
+  ],
+  "contacts": [
+    { "id": "1", "name": "Alice", "number": "+491234567890", "starred": true }
+  ]
 }
 ```

 ### TTS Setup (Optional)

-For neural "connecting your call" announcements, download the Kokoro TTS model:
+For neural announcements and voicemail greetings, download the Kokoro TTS model:

 ```bash
 mkdir -p .nogit/tts
-# Download the full-quality model (310MB) + voices (27MB)
 curl -L -o .nogit/tts/kokoro-v1.0.onnx \
  https://github.com/mzdk100/kokoro/releases/download/V1.0/kokoro-v1.0.onnx
 curl -L -o .nogit/tts/voices.bin \
  https://github.com/mzdk100/kokoro/releases/download/V1.0/voices.bin
 ```

-If the model files aren't present, the announcement feature is simply disabled — everything else works fine.
+Without the model files, TTS falls back to `espeak-ng`. Without either, announcements are skipped — everything else works fine.

 ### Run

@@ -145,7 +189,7 @@ If the model files aren't present, the announcement feature is simply disabled
 pnpm start
 ```

-The SIP proxy starts on the configured port and the web dashboard is available at `http://<your-ip>:3060`.
+The SIP proxy starts on the configured port and the web dashboard is available at `https://<your-ip>:3060`.

 ### HTTPS (Optional)

@@ -157,68 +201,91 @@ Place `cert.pem` and `key.pem` in `.nogit/` for TLS on the dashboard.

 ```
 siprouter/
-├── ts/                        # TypeScript source
-│   ├── sipproxy.ts            # Main entry — bootstraps everything
-│   ├── config.ts              # Config loader & validation
-│   ├── registrar.ts           # Local SIP registrar for devices
-│   ├── providerstate.ts       # Per-provider upstream registration engine
-│   ├── frontend.ts            # Web dashboard HTTP/WS server + REST API
-│   ├── webrtcbridge.ts        # WebRTC signaling layer
-│   ├── opusbridge.ts          # Rust IPC bridge (smartrust)
-│   ├── codec.ts               # High-level RTP transcoding interface
-│   ├── announcement.ts        # Neural TTS announcement generator
-│   ├── sip/                   # Zero-dependency SIP protocol library
-│   │   ├── message.ts         #   SIP message parser/builder/mutator
-│   │   ├── dialog.ts          #   RFC 3261 dialog state machine
-│   │   ├── helpers.ts         #   SDP builder, digest auth, codec registry
-│   │   └── rewrite.ts         #   SIP URI + SDP body rewriting
-│   └── call/                  # Hub-model call management
-│       ├── call-manager.ts    #   Central registry, factory, routing
-│       ├── call.ts            #   Call hub — owns N legs, media fan-out
-│       ├── sip-leg.ts         #   SIP device/provider connection
-│       ├── webrtc-leg.ts      #   Browser WebRTC connection
-│       └── rtp-port-pool.ts   #   UDP port allocation
-├── ts_web/                    # Web frontend (Lit-based SPA)
-│   ├── elements/              #   Web components (dashboard, phone, etc.)
-│   └── state/                 #   App state, WebRTC client, notifications
-├── rust/                      # Rust workspace
+├── ts/                            # TypeScript control plane
+│   ├── sipproxy.ts                # Main entry — bootstraps everything
+│   ├── config.ts                  # Config loader & validation
+│   ├── proxybridge.ts             # Rust proxy-engine IPC bridge (smartrust)
+│   ├── frontend.ts                # Web dashboard HTTP/WS server + REST API
+│   ├── webrtcbridge.ts            # WebRTC signaling layer
+│   ├── registrar.ts               # Browser softphone registration
+│   ├── announcement.ts            # TTS announcement generator (espeak-ng / Kokoro)
+│   ├── voicebox.ts                # Voicemail box management
+│   └── call/
+│       └── prompt-cache.ts        # Named audio prompt WAV management
+│
+├── ts_web/                        # Web frontend (Lit-based SPA)
+│   ├── elements/                  # Web components (9 dashboard views)
+│   └── state/                     # App state, WebRTC client, notifications
+│
+├── rust/                          # Rust workspace (the data plane)
 │   └── crates/
-│       ├── opus-codec/        #   Real-time audio transcoder (Opus/G.722/PCM)
-│       └── tts-engine/        #   Kokoro neural TTS CLI
-├── html/                      # Static HTML shell
-├── .nogit/                    # Secrets, config, models (gitignored)
-└── dist_rust/                 # Compiled Rust binaries (gitignored)
+│       ├── codec-lib/             # Audio codec library (Opus/G.722/PCMU/PCMA)
+│       ├── sip-proto/             # Zero-dependency SIP protocol library
+│       └── proxy-engine/          # Main binary — SIP engine + mixer + RTP
+│
+├── html/                          # Static HTML shell
+├── .nogit/                        # Secrets, config, TTS models (gitignored)
+└── dist_rust/                     # Compiled Rust binary (gitignored)
 ```

 ---

-## 🎧 Codec Engine (Rust)
+## 🎧 Audio Engine (Rust)

-The `opus-codec` binary handles all real-time audio processing via a JSON-over-stdio IPC protocol:
+The `proxy-engine` binary handles all real-time audio processing with a **48kHz f32 internal bus** — encoding and decoding happens only at leg boundaries.

-| Codec | Payload Type | Sample Rate | Use Case |
-|-------|-------------|-------------|----------|
-| **Opus** | 111 | 48 kHz | WebRTC browsers |
-| **G.722** | 9 | 16 kHz | HD SIP devices |
+### Supported Codecs
+
+| Codec | PT | Native Rate | Use Case |
+|-------|:--:|:-----------:|----------|
+| **Opus** | 111 | 48 kHz | WebRTC browsers (native float encode/decode — zero i16 quantization) |
+| **G.722** | 9 | 16 kHz | HD SIP devices & providers |
 | **PCMU** (G.711 µ-law) | 0 | 8 kHz | Legacy SIP |
 | **PCMA** (G.711 A-law) | 8 | 8 kHz | Legacy SIP |

-**Features:**
- Per-call isolated codec sessions (no cross-call state corruption)
- FFT-based sample rate conversion via `rubato`
- **RNNoise ML noise suppression** with per-direction state — denoises audio flowing to SIP separately from audio flowing to the browser
- Raw PCM encoding for TTS frame processing
+### Audio Pipeline
+
+```
+Inbound:   Wire RTP → Decode → Resample to 48kHz → Denoise (RNNoise) → Mix Bus
+Outbound:  Mix Bus → Mix-Minus → Resample to codec rate → Encode → Wire RTP
+```
+
+- **FFT-based resampling** via `rubato` — high-quality sinc interpolation with cached resampler state for seamless inter-frame continuity
+- **ML noise suppression** via `nnnoiseless` (RNNoise) — per-leg inbound denoising with SIMD acceleration (AVX/SSE). Skipped for WebRTC legs (browsers already denoise via getUserMedia)
+- **Mix-minus mixing** — each participant hears everyone except themselves, accumulated in f64 precision
+- **In-tick packet reorder** — inbound RTP packets are sorted by sequence number before decoding, protecting G.722 ADPCM state from out-of-order delivery
+- **RFC 3550 compliant header parsing** — properly handles CSRC lists and header extensions

 ---

-## 🗣️ Neural TTS (Rust)
+## 🗣️ Neural TTS

-The `tts-engine` binary uses [Kokoro TTS](https://github.com/mzdk100/kokoro) (82M parameter neural model) to synthesize announcements at startup:
+Announcements and voicemail greetings are synthesized using [Kokoro TTS](https://github.com/mzdk100/kokoro) — an 82M parameter neural model running via ONNX Runtime directly in the Rust process:

 - **24 kHz, 16-bit mono** output
 - **25+ voice presets** — American/British, male/female (e.g., `af_bella`, `am_adam`, `bf_emma`, `bm_george`)
- **~800ms** synthesis time for a 3-second announcement
- Pre-encoded to G.722 + Opus for zero-latency RTP playback during call setup
+- **~800ms** synthesis time for a 3-second phrase
+- Lazy-loaded on first use — no startup cost if TTS is unused
+- Falls back to `espeak-ng` if the ONNX model is not available
+
+---
+
+## 📧 Voicemail
+
+- Configurable voicemail boxes with custom TTS greetings
+- Automatic routing on no-answer timeout
+- Recording with configurable max duration and message count
+- Web dashboard playback and management
+- WAV storage in `.nogit/voicemail/`
+
+---
+
+## 🔢 IVR (Interactive Voice Response)
+
+- DTMF-navigable menus with configurable entries
+- Actions: route to extension, route to voicemail, transfer, submenu, hangup, repeat prompt
+- Custom TTS prompts per menu
+- Nested menu support

 ---

@@ -228,33 +295,42 @@ The `tts-engine` binary uses [Kokoro TTS](https://github.com/mzdk100/kokoro) (82

 | View | Description |
 |------|-------------|
-| **Overview** | Stats tiles — uptime, providers, devices, active calls |
-| **Calls** | Active calls with leg details, codec info, packet counters. Add/remove legs, transfer, hangup |
-| **Phone** | Browser softphone — mic/speaker selection, audio meters, dial pad, incoming call popup |
-| **Contacts** | Contact management with click-to-call |
-| **Providers** | SIP trunk config with registration status |
-| **Log** | Live streaming log viewer |
+| 📊 **Overview** | Stats tiles — uptime, providers, devices, active calls |
+| 📞 **Calls** | Active calls with leg details, codec info, add/remove legs, transfer, hangup |
+| ☎️ **Phone** | Browser softphone — mic/speaker selection, audio meters, dial pad, incoming call popup |
+| 🔀 **Routes** | Routing rule management — match/action model with priority |
+| 📧 **Voicemail** | Voicemail box management + message playback |
+| 🔢 **IVR** | IVR menu builder — DTMF entries, TTS prompts, nested menus |
+| 👤 **Contacts** | Contact management with click-to-call |
+| 🔌 **Providers** | SIP trunk configuration and registration status |
+| 📋 **Log** | Live streaming log viewer |

 ### REST API

 | Endpoint | Method | Description |
 |----------|--------|-------------|
-| `/api/status` | GET | Full system status (providers, devices, calls) |
+| `/api/status` | GET | Full system status (providers, devices, calls, history) |
 | `/api/call` | POST | Originate a call |
 | `/api/hangup` | POST | Hang up a call |
-| `/api/call/:id/addleg` | POST | Add a leg to an active call |
-| `/api/call/:id/addexternal` | POST | Add an external participant |
+| `/api/call/:id/addleg` | POST | Add a device leg to an active call |
+| `/api/call/:id/addexternal` | POST | Add an external participant via provider |
 | `/api/call/:id/removeleg` | POST | Remove a leg from a call |
 | `/api/transfer` | POST | Transfer a call |
-| `/api/config` | GET/POST | Read or update configuration (hot-reload) |
+| `/api/config` | GET | Read current configuration |
+| `/api/config` | POST | Update configuration (hot-reload) |
+| `/api/voicemail/:box` | GET | List voicemail messages |
+| `/api/voicemail/:box/:id` | DELETE | Delete a voicemail message |
+| `/api/voicemail/:box/:id/audio` | GET | Stream voicemail audio |

 ### WebSocket Events

 Connect to `/ws` for real-time push:

 ```jsonc
-{ "type": "status", "data": { ... } }       // Full status snapshot (1s interval)
+{ "type": "status", "data": { ... } }           // Full status snapshot (1s interval)
 { "type": "log", "data": { "message": "..." } } // Log lines in real-time
+{ "type": "incoming_call", "data": { ... } }     // Incoming call notification
+{ "type": "call_ended", "data": { ... } }        // Call ended notification
 ```

 ---
@@ -264,7 +340,7 @@ Connect to `/ws` for real-time push:
 | Port | Protocol | Purpose |
 |------|----------|---------|
 | 5070 (configurable) | UDP | SIP signaling |
-| 20000–20200 (configurable) | UDP | RTP relay (even ports, per-call allocation) |
+| 20000–20200 (configurable) | UDP | RTP media (even ports, per-call allocation) |
 | 3060 (configurable) | TCP | Web dashboard + WebSocket + REST API |

 ---
@@ -275,23 +351,16 @@ Connect to `/ws` for real-time push:
 # Start in dev mode
 pnpm start

-# Build Rust crates
+# Build Rust proxy-engine
 pnpm run buildRust

 # Bundle web frontend
 pnpm run bundle

-# Restart background server (build + bundle + restart)
+# Build + bundle + restart background server
 pnpm run restartBackground
 ```

-### Key Design Decisions
-
- **Hub Model** — Calls are N-leg hubs, not point-to-point. This enables multi-party, dynamic leg manipulation, and transfer without tearing down the call.
- **Zero-dependency SIP library** — `ts/sip/` is a pure data-level SIP stack (parse/build/mutate/serialize). No transport or timer logic — those live in the application layer.
- **Rust for the hot path** — Codec transcoding and noise suppression run in native Rust for real-time performance. TypeScript handles signaling and orchestration.
- **Per-session codec isolation** — Each call gets its own Opus/G.722 encoder/decoder state in the Rust process, preventing stateful codec prediction from leaking between concurrent calls.
-
 ---

 ## License and Legal Information
--- a/rust/crates/codec-lib/src/lib.rs
+++ b/rust/crates/codec-lib/src/lib.rs
@@ -142,8 +142,10 @@ impl TranscodeState {
    }

    /// High-quality sample rate conversion using rubato FFT resampler.
-    /// Resamplers are cached by (from_rate, to_rate, chunk_size) and reused,
-    /// maintaining proper inter-frame state for continuous audio streams.
+    ///
+    /// To maintain continuous filter state, the resampler always processes at a
+    /// canonical chunk size (20ms at the source rate). This prevents cache
+    /// thrashing from variable input sizes and preserves inter-frame filter state.
    pub fn resample(
        &mut self,
        pcm: &[i16],
@@ -154,28 +156,61 @@ impl TranscodeState {
            return Ok(pcm.to_vec());
        }

-        let chunk = pcm.len();
-        let key = (from_rate, to_rate, chunk);
+        let canonical_chunk = (from_rate as usize) / 50; // 20ms
+        let key = (from_rate, to_rate, canonical_chunk);

        if !self.resamplers.contains_key(&key) {
-            let r =
-                FftFixedIn::<f64>::new(from_rate as usize, to_rate as usize, chunk, 1, 1)
-                    .map_err(|e| format!("resampler {from_rate}->{to_rate}: {e}"))?;
+            let r = FftFixedIn::<f64>::new(
+                from_rate as usize,
+                to_rate as usize,
+                canonical_chunk,
+                1,
+                1,
+            )
+            .map_err(|e| format!("resampler {from_rate}->{to_rate}: {e}"))?;
            self.resamplers.insert(key, r);
        }
        let resampler = self.resamplers.get_mut(&key).unwrap();

-        let float_in: Vec<f64> = pcm.iter().map(|&s| s as f64 / 32768.0).collect();
-        let input = vec![float_in];
+        let mut output = Vec::with_capacity(
+            (pcm.len() as f64 * to_rate as f64 / from_rate as f64).ceil() as usize + 16,
+        );

-        let result = resampler
-            .process(&input, None)
-            .map_err(|e| format!("resample {from_rate}->{to_rate}: {e}"))?;
+        let mut offset = 0;
+        while offset < pcm.len() {
+            let remaining = pcm.len() - offset;
+            let copy_len = remaining.min(canonical_chunk);
+            let mut chunk = vec![0.0f64; canonical_chunk];
+            for i in 0..copy_len {
+                chunk[i] = pcm[offset + i] as f64 / 32768.0;
+            }

-        Ok(result[0]
-            .iter()
-            .map(|&s| (s * 32767.0).round().clamp(-32768.0, 32767.0) as i16)
-            .collect())
+            let input = vec![chunk];
+            let result = resampler
+                .process(&input, None)
+                .map_err(|e| format!("resample {from_rate}->{to_rate}: {e}"))?;
+
+            if remaining < canonical_chunk {
+                let expected =
+                    (copy_len as f64 * to_rate as f64 / from_rate as f64).round() as usize;
+                let take = expected.min(result[0].len());
+                output.extend(
+                    result[0][..take]
+                        .iter()
+                        .map(|&s| (s * 32767.0).round().clamp(-32768.0, 32767.0) as i16),
+                );
+            } else {
+                output.extend(
+                    result[0]
+                        .iter()
+                        .map(|&s| (s * 32767.0).round().clamp(-32768.0, 32767.0) as i16),
+                );
+            }
+
+            offset += canonical_chunk;
+        }
+
+        Ok(output)
    }

    /// Apply RNNoise ML noise suppression to 48kHz PCM audio.
@@ -329,6 +364,21 @@ impl TranscodeState {
        }
    }

+    /// Opus packet loss concealment — synthesize one frame to fill a gap.
+    /// Returns f32 PCM at 48kHz. `frame_size` should be 960 for 20ms.
+    pub fn opus_plc(&mut self, frame_size: usize) -> Result<Vec<f32>, String> {
+        let mut pcm = vec![0.0f32; frame_size];
+        let out = MutSignals::try_from(&mut pcm[..])
+            .map_err(|e| format!("opus plc signals: {e}"))?;
+        let n: usize = self
+            .opus_dec
+            .decode_float(None::<OpusPacket<'_>>, out, false)
+            .map_err(|e| format!("opus plc: {e}"))?
+            .into();
+        pcm.truncate(n);
+        Ok(pcm)
+    }
+
    /// Encode f32 PCM samples ([-1.0, 1.0]) to an audio codec.
    ///
    /// For Opus, uses native float encode (no i16 quantization).
@@ -357,7 +407,10 @@ impl TranscodeState {
    }

    /// High-quality sample rate conversion for f32 PCM using rubato FFT resampler.
-    /// Uses a separate cache from the i16 resampler.
+    ///
+    /// To maintain continuous filter state, the resampler always processes at a
+    /// canonical chunk size (20ms at the source rate). This prevents cache
+    /// thrashing from variable input sizes and preserves inter-frame filter state.
    pub fn resample_f32(
        &mut self,
        pcm: &[f32],
@@ -368,23 +421,50 @@ impl TranscodeState {
            return Ok(pcm.to_vec());
        }

-        let chunk = pcm.len();
-        let key = (from_rate, to_rate, chunk);
+        let canonical_chunk = (from_rate as usize) / 50; // 20ms
+        let key = (from_rate, to_rate, canonical_chunk);

        if !self.resamplers_f32.contains_key(&key) {
-            let r =
-                FftFixedIn::<f32>::new(from_rate as usize, to_rate as usize, chunk, 1, 1)
-                    .map_err(|e| format!("resampler f32 {from_rate}->{to_rate}: {e}"))?;
+            let r = FftFixedIn::<f32>::new(
+                from_rate as usize,
+                to_rate as usize,
+                canonical_chunk,
+                1,
+                1,
+            )
+            .map_err(|e| format!("resampler f32 {from_rate}->{to_rate}: {e}"))?;
            self.resamplers_f32.insert(key, r);
        }
        let resampler = self.resamplers_f32.get_mut(&key).unwrap();

-        let input = vec![pcm.to_vec()];
-        let result = resampler
-            .process(&input, None)
-            .map_err(|e| format!("resample f32 {from_rate}->{to_rate}: {e}"))?;
+        let mut output = Vec::with_capacity(
+            (pcm.len() as f64 * to_rate as f64 / from_rate as f64).ceil() as usize + 16,
+        );

-        Ok(result[0].clone())
+        let mut offset = 0;
+        while offset < pcm.len() {
+            let remaining = pcm.len() - offset;
+            let mut chunk = vec![0.0f32; canonical_chunk];
+            let copy_len = remaining.min(canonical_chunk);
+            chunk[..copy_len].copy_from_slice(&pcm[offset..offset + copy_len]);
+
+            let input = vec![chunk];
+            let result = resampler
+                .process(&input, None)
+                .map_err(|e| format!("resample f32 {from_rate}->{to_rate}: {e}"))?;
+
+            if remaining < canonical_chunk {
+                let expected =
+                    (copy_len as f64 * to_rate as f64 / from_rate as f64).round() as usize;
+                output.extend_from_slice(&result[0][..expected.min(result[0].len())]);
+            } else {
+                output.extend_from_slice(&result[0]);
+            }
+
+            offset += canonical_chunk;
+        }
+
+        Ok(output)
    }

    /// Apply RNNoise ML noise suppression to 48kHz f32 PCM audio.
--- a/rust/crates/proxy-engine/src/jitter_buffer.rs
+++ b/rust/crates/proxy-engine/src/jitter_buffer.rs
@@ -0,0 +1,188 @@
+//! Per-leg adaptive jitter buffer for the audio mixer.
+//!
+//! Sits between inbound RTP packet reception and the mixer's decode step.
+//! Reorders packets by sequence number and delivers exactly one frame per
+//! 20ms mixer tick, smoothing out network jitter. When a packet is missing,
+//! the mixer can invoke codec PLC to conceal the gap.
+
+use crate::mixer::RtpPacket;
+use std::collections::BTreeMap;
+
+/// Per-leg jitter buffer. Collects RTP packets keyed by sequence number,
+/// delivers one frame per 20ms tick in sequence order.
+///
+/// Adaptive target depth: starts at 3 frames (60ms), adjusts between
+/// 2–6 frames based on observed jitter.
+pub struct JitterBuffer {
+    /// Packets waiting for playout, keyed by seq number.
+    buffer: BTreeMap<u16, RtpPacket>,
+    /// Next expected sequence number for playout.
+    next_seq: Option<u16>,
+    /// Target buffer depth in frames (adaptive).
+    target_depth: u32,
+    /// Current fill level high-water mark (for adaptation).
+    max_fill_seen: u32,
+    /// Ticks since last adaptation adjustment.
+    adapt_counter: u32,
+    /// Consecutive ticks where buffer was empty (for ramp-up).
+    empty_streak: u32,
+    /// Consecutive ticks where buffer had excess (for ramp-down).
+    excess_streak: u32,
+    /// Whether we've started playout (initial fill complete).
+    playing: bool,
+    /// Number of frames consumed since start (for stats).
+    frames_consumed: u64,
+    /// Number of frames lost (gap in sequence).
+    frames_lost: u64,
+}
+
+/// What the mixer gets back each tick.
+pub enum JitterResult {
+    /// A packet is available for decoding.
+    Packet(RtpPacket),
+    /// Packet was expected but missing — invoke PLC.
+    Missing,
+    /// Buffer is in initial fill phase — output silence.
+    Filling,
+}
+
+impl JitterBuffer {
+    pub fn new() -> Self {
+        Self {
+            buffer: BTreeMap::new(),
+            next_seq: None,
+            target_depth: 3, // 60ms initial target
+            max_fill_seen: 0,
+            adapt_counter: 0,
+            empty_streak: 0,
+            excess_streak: 0,
+            playing: false,
+            frames_consumed: 0,
+            frames_lost: 0,
+        }
+    }
+
+    /// Push a received RTP packet into the buffer.
+    pub fn push(&mut self, pkt: RtpPacket) {
+        // Ignore duplicates.
+        if self.buffer.contains_key(&pkt.seq) {
+            return;
+        }
+
+        // Detect large forward seq jump (hold/resume, SSRC change).
+        if let Some(next) = self.next_seq {
+            let jump = pkt.seq.wrapping_sub(next);
+            if jump > 1000 && jump < 0x8000 {
+                // Massive forward jump — reset buffer.
+                self.reset();
+                self.next_seq = Some(pkt.seq);
+            }
+        }
+
+        if self.next_seq.is_none() {
+            self.next_seq = Some(pkt.seq);
+        }
+
+        self.buffer.insert(pkt.seq, pkt);
+    }
+
+    /// Consume one frame for the current 20ms tick.
+    /// Called once per mixer tick per leg.
+    pub fn consume(&mut self) -> JitterResult {
+        // Track fill level for adaptation.
+        let fill = self.buffer.len() as u32;
+        if fill > self.max_fill_seen {
+            self.max_fill_seen = fill;
+        }
+
+        // Initial fill phase: wait until we have target_depth packets.
+        if !self.playing {
+            if fill >= self.target_depth {
+                self.playing = true;
+            } else {
+                return JitterResult::Filling;
+            }
+        }
+
+        let seq = match self.next_seq {
+            Some(s) => s,
+            None => return JitterResult::Filling,
+        };
+
+        // Advance next_seq (wrapping u16).
+        self.next_seq = Some(seq.wrapping_add(1));
+
+        // Try to pull the expected sequence number.
+        if let Some(pkt) = self.buffer.remove(&seq) {
+            self.frames_consumed += 1;
+            self.empty_streak = 0;
+
+            // Adaptive: if buffer is consistently deep, we can tighten.
+            if fill > self.target_depth + 2 {
+                self.excess_streak += 1;
+            } else {
+                self.excess_streak = 0;
+            }
+
+            JitterResult::Packet(pkt)
+        } else {
+            // Packet missing — PLC needed.
+            self.frames_lost += 1;
+            self.empty_streak += 1;
+            self.excess_streak = 0;
+
+            JitterResult::Missing
+        }
+    }
+
+    /// Run adaptation logic. Call every tick; internally gates to ~1s intervals.
+    pub fn adapt(&mut self) {
+        self.adapt_counter += 1;
+        if self.adapt_counter < 50 {
+            return;
+        }
+        self.adapt_counter = 0;
+
+        // If we had many empty ticks, increase depth.
+        if self.empty_streak > 3 && self.target_depth < 6 {
+            self.target_depth += 1;
+        }
+        // If buffer consistently overfull, decrease depth.
+        else if self.excess_streak > 25 && self.target_depth > 2 {
+            self.target_depth -= 1;
+        }
+
+        self.max_fill_seen = 0;
+    }
+
+    /// Discard packets that are too old (seq far behind next_seq).
+    /// Prevents unbounded memory growth from reordered/late packets.
+    pub fn prune_stale(&mut self) {
+        if let Some(next) = self.next_seq {
+            // Remove anything more than 100 frames behind playout point.
+            // Use wrapping arithmetic: if (next - seq) > 100, it's stale.
+            let stale: Vec<u16> = self
+                .buffer
+                .keys()
+                .filter(|&&seq| {
+                    let age = next.wrapping_sub(seq);
+                    age > 100 && age < 0x8000 // < 0x8000 means it's actually behind, not ahead
+                })
+                .copied()
+                .collect();
+            for seq in stale {
+                self.buffer.remove(&seq);
+            }
+        }
+    }
+
+    /// Reset the buffer (e.g., after re-INVITE / hold-resume).
+    pub fn reset(&mut self) {
+        self.buffer.clear();
+        self.next_seq = None;
+        self.playing = false;
+        self.empty_streak = 0;
+        self.excess_streak = 0;
+        self.adapt_counter = 0;
+    }
+}
--- a/rust/crates/proxy-engine/src/main.rs
+++ b/rust/crates/proxy-engine/src/main.rs
@@ -12,6 +12,7 @@ mod call_manager;
 mod config;
 mod dtmf;
 mod ipc;
+mod jitter_buffer;
 mod leg_io;
 mod mixer;
 mod provider;
--- a/rust/crates/proxy-engine/src/mixer.rs
+++ b/rust/crates/proxy-engine/src/mixer.rs
@@ -15,6 +15,7 @@
 //! 6. Forward DTMF between participant legs only

 use crate::ipc::{emit_event, OutTx};
+use crate::jitter_buffer::{JitterBuffer, JitterResult};
 use crate::rtp::{build_rtp_header, rtp_clock_increment};
 use codec_lib::{codec_sample_rate, new_denoiser, TranscodeState};
 use nnnoiseless::DenoiseState;
@@ -164,6 +165,8 @@ struct MixerLegSlot {
    last_pcm_frame: Vec<f32>,
    /// Number of consecutive ticks with no inbound packet.
    silent_ticks: u32,
+    /// Per-leg jitter buffer for packet reordering and timing.
+    jitter: JitterBuffer,
    // RTP output state.
    rtp_seq: u16,
    rtp_ts: u32,
@@ -238,6 +241,7 @@ async fn mixer_loop(
                            rtp_ts: 0,
                            rtp_ssrc: rand::random(),
                            role: LegRole::Participant,
+                            jitter: JitterBuffer::new(),
                        },
                    );
                }
@@ -331,35 +335,27 @@ async fn mixer_loop(
        for lid in &leg_ids {
            let slot = legs.get_mut(lid).unwrap();

-            // Drain channel — collect DTMF separately, collect ALL audio packets.
-            let mut audio_packets: Vec<RtpPacket> = Vec::new();
+            // Step 2a: Drain all pending packets into the jitter buffer.
+            let mut got_audio = false;
            loop {
                match slot.inbound_rx.try_recv() {
                    Ok(pkt) => {
                        if pkt.payload_type == 101 {
-                            // DTMF telephone-event: collect for processing.
                            dtmf_forward.push((lid.clone(), pkt));
                        } else {
-                            audio_packets.push(pkt);
+                            got_audio = true;
+                            slot.jitter.push(pkt);
                        }
                    }
                    Err(_) => break,
                }
            }

-            if !audio_packets.is_empty() {
-                slot.silent_ticks = 0;
-
-                // Sort by sequence number for correct codec state progression.
-                // This prevents G.722 ADPCM state corruption from out-of-order packets.
-                audio_packets.sort_by_key(|p| p.seq);
-
-                // Decode ALL packets in order (maintains codec state),
-                // but only keep the last decoded frame for mixing.
-                for pkt in &audio_packets {
+            // Step 2b: Consume exactly one frame from the jitter buffer.
+            match slot.jitter.consume() {
+                JitterResult::Packet(pkt) => {
                    match slot.transcoder.decode_to_f32(&pkt.payload, pkt.payload_type) {
                        Ok((pcm, rate)) => {
-                            // Resample to 48kHz mixing rate if needed.
                            let pcm_48k = if rate == MIX_RATE {
                                pcm
                            } else {
@@ -367,15 +363,11 @@ async fn mixer_loop(
                                    .resample_f32(&pcm, rate, MIX_RATE)
                                    .unwrap_or_else(|_| vec![0.0f32; MIX_FRAME_SIZE])
                            };
-                            // Per-leg inbound denoising at 48kHz.
-                            // Only for SIP telephony legs — WebRTC browsers
-                            // already apply noise suppression via getUserMedia.
                            let processed = if slot.codec_pt != codec_lib::PT_OPUS {
                                TranscodeState::denoise_f32(&mut slot.denoiser, &pcm_48k)
                            } else {
                                pcm_48k
                            };
-                            // Pad or truncate to exactly MIX_FRAME_SIZE.
                            let mut frame = processed;
                            frame.resize(MIX_FRAME_SIZE, 0.0);
                            slot.last_pcm_frame = frame;
@@ -383,15 +375,43 @@ async fn mixer_loop(
                        Err(_) => {}
                    }
                }
-            } else if dtmf_forward.iter().any(|(src, _)| src == lid) {
-                // Got DTMF but no audio — don't bump silent_ticks (DTMF counts as activity).
+                JitterResult::Missing => {
+                    // Invoke Opus PLC or fade for non-Opus codecs.
+                    if slot.codec_pt == codec_lib::PT_OPUS {
+                        match slot.transcoder.opus_plc(MIX_FRAME_SIZE) {
+                            Ok(pcm) => {
+                                slot.last_pcm_frame = pcm;
+                            }
+                            Err(_) => {
+                                for s in slot.last_pcm_frame.iter_mut() {
+                                    *s *= 0.8;
+                                }
+                            }
+                        }
+                    } else {
+                        // Non-Opus: fade last frame toward silence.
+                        for s in slot.last_pcm_frame.iter_mut() {
+                            *s *= 0.85;
+                        }
+                    }
+                }
+                JitterResult::Filling => {
+                    slot.last_pcm_frame = vec![0.0f32; MIX_FRAME_SIZE];
+                }
+            }
+
+            // Run jitter adaptation + prune stale packets.
+            slot.jitter.adapt();
+            slot.jitter.prune_stale();
+
+            // Silent ticks: based on actual network reception, not jitter buffer state.
+            if got_audio || dtmf_forward.iter().any(|(src, _)| src == lid) {
                slot.silent_ticks = 0;
            } else {
                slot.silent_ticks += 1;
-                // After 150 ticks (3 seconds) of silence, zero out to avoid stale audio.
-                if slot.silent_ticks > 150 {
-                    slot.last_pcm_frame = vec![0.0f32; MIX_FRAME_SIZE];
-                }
+            }
+            if slot.silent_ticks > 150 {
+                slot.last_pcm_frame = vec![0.0f32; MIX_FRAME_SIZE];
            }
        }

--- a/ts/00_commitinfo_data.ts
+++ b/ts/00_commitinfo_data.ts
@@ -3,6 +3,6 @@
 */
 export const commitinfo = {
  name: 'siprouter',
-  version: '1.17.2',
+  version: '1.19.0',
  description: 'undefined'
 }
--- a/ts_web/00_commitinfo_data.ts
+++ b/ts_web/00_commitinfo_data.ts
@@ -3,6 +3,6 @@
 */
 export const commitinfo = {
  name: 'siprouter',
-  version: '1.17.2',
+  version: '1.19.0',
  description: 'undefined'
 }
Author	SHA1	Message	Date
Juergen Kunz	c3a63a4092	v1.19.0	2026-04-10 21:15:34 +00:00
Juergen Kunz	7c4756402e	feat(proxy-engine,codec-lib): add adaptive RTP jitter buffering with Opus packet loss concealment and stable 20ms resampling	2026-04-10 21:15:34 +00:00
Juergen Kunz	b6950e11d2	v1.18.0	2026-04-10 17:25:34 +00:00
Juergen Kunz	e4935fbf21	feat(readme): expand documentation for voicemail, IVR, audio engine, and API capabilities	2026-04-10 17:25:34 +00:00