feat(proxy-engine): add on-demand TTS caching for voicemail and IVR prompts

2026-04-12 20:45:08 +00:00
parent cfadd7a2b6
commit 59d8c2557c
17 changed files with 460 additions and 488 deletions
--- a/readme.md
+++ b/readme.md
@@ -20,7 +20,7 @@ siprouter sits between your SIP trunk providers and your endpoints — hardware
 - 🎯 **Adaptive Jitter Buffer** — Per-leg jitter buffering with sequence-based reordering, adaptive depth (60–120ms), Opus PLC for lost packets, and hold/resume detection
 - 📧 **Voicemail** — Configurable voicemail boxes with TTS greetings, recording, and web playback
 - 🔢 **IVR Menus** — DTMF-navigable interactive voice response with nested menus, routing actions, and custom prompts
- 🗣️ **Neural TTS** — Kokoro-powered announcements and greetings with 25+ voice presets, backed by espeak-ng fallback
+- 🗣️ **Neural TTS** — Kokoro-powered greetings and IVR prompts with 25+ voice presets
 - 🎙️ **Call Recording** — Per-source separated WAV recording at 48kHz via tool legs
 - 🖥️ **Web Dashboard** — Real-time SPA with 9 views: live calls, browser phone, routing, voicemail, IVR, contacts, providers, and streaming logs

@@ -98,7 +98,6 @@ sequenceDiagram
 - **Node.js** ≥ 20 with `tsx` globally available
 - **pnpm** for package management
 - **Rust** toolchain (for building the proxy engine)
- **espeak-ng** (optional, for TTS fallback)

 ### Install & Build

@@ -190,7 +189,7 @@ Create `.nogit/config.json`:

 ### TTS Setup (Optional)

-For neural announcements and voicemail greetings, download the Kokoro TTS model:
+For neural voicemail greetings and IVR prompts, download the Kokoro TTS model:

 ```bash
 mkdir -p .nogit/tts
@@ -200,7 +199,7 @@ curl -L -o .nogit/tts/voices.bin \
  https://github.com/mzdk100/kokoro/releases/download/V1.0/voices.bin
 ```

-Without the model files, TTS falls back to `espeak-ng`. Without either, announcements are skipped — everything else works fine.
+Without the model files, TTS prompts (IVR menus, voicemail greetings) are skipped — everything else works fine.

 ### Run

@@ -227,7 +226,6 @@ siprouter/
 │   ├── frontend.ts                # Web dashboard HTTP/WS server + REST API
 │   ├── webrtcbridge.ts            # WebRTC signaling layer
 │   ├── registrar.ts               # Browser softphone registration
-│   ├── announcement.ts            # TTS announcement generator (espeak-ng / Kokoro)
 │   ├── voicebox.ts                # Voicemail box management
 │   └── call/
 │       └── prompt-cache.ts        # Named audio prompt WAV management
@@ -288,13 +286,12 @@ flowchart LR

 ## 🗣️ Neural TTS

-Announcements and voicemail greetings are synthesized using [Kokoro TTS](https://github.com/mzdk100/kokoro) — an 82M parameter neural model running via ONNX Runtime directly in the Rust process:
+Voicemail greetings and IVR prompts are synthesized using [Kokoro TTS](https://github.com/mzdk100/kokoro) — an 82M parameter neural model running via ONNX Runtime directly in the Rust process:

 - **24 kHz, 16-bit mono** output
 - **25+ voice presets** — American/British, male/female (e.g., `af_bella`, `am_adam`, `bf_emma`, `bm_george`)
 - **~800ms** synthesis time for a 3-second phrase
 - Lazy-loaded on first use — no startup cost if TTS is unused
- Falls back to `espeak-ng` if the ONNX model is not available

 ---