feat(proxy-engine): add on-demand TTS caching for voicemail and IVR prompts

This commit is contained in:
2026-04-12 20:45:08 +00:00
parent cfadd7a2b6
commit 59d8c2557c
17 changed files with 460 additions and 488 deletions

View File

@@ -20,7 +20,7 @@ siprouter sits between your SIP trunk providers and your endpoints — hardware
- 🎯 **Adaptive Jitter Buffer** — Per-leg jitter buffering with sequence-based reordering, adaptive depth (60120ms), Opus PLC for lost packets, and hold/resume detection
- 📧 **Voicemail** — Configurable voicemail boxes with TTS greetings, recording, and web playback
- 🔢 **IVR Menus** — DTMF-navigable interactive voice response with nested menus, routing actions, and custom prompts
- 🗣️ **Neural TTS** — Kokoro-powered announcements and greetings with 25+ voice presets, backed by espeak-ng fallback
- 🗣️ **Neural TTS** — Kokoro-powered greetings and IVR prompts with 25+ voice presets
- 🎙️ **Call Recording** — Per-source separated WAV recording at 48kHz via tool legs
- 🖥️ **Web Dashboard** — Real-time SPA with 9 views: live calls, browser phone, routing, voicemail, IVR, contacts, providers, and streaming logs
@@ -98,7 +98,6 @@ sequenceDiagram
- **Node.js** ≥ 20 with `tsx` globally available
- **pnpm** for package management
- **Rust** toolchain (for building the proxy engine)
- **espeak-ng** (optional, for TTS fallback)
### Install & Build
@@ -190,7 +189,7 @@ Create `.nogit/config.json`:
### TTS Setup (Optional)
For neural announcements and voicemail greetings, download the Kokoro TTS model:
For neural voicemail greetings and IVR prompts, download the Kokoro TTS model:
```bash
mkdir -p .nogit/tts
@@ -200,7 +199,7 @@ curl -L -o .nogit/tts/voices.bin \
https://github.com/mzdk100/kokoro/releases/download/V1.0/voices.bin
```
Without the model files, TTS falls back to `espeak-ng`. Without either, announcements are skipped — everything else works fine.
Without the model files, TTS prompts (IVR menus, voicemail greetings) are skipped — everything else works fine.
### Run
@@ -227,7 +226,6 @@ siprouter/
│ ├── frontend.ts # Web dashboard HTTP/WS server + REST API
│ ├── webrtcbridge.ts # WebRTC signaling layer
│ ├── registrar.ts # Browser softphone registration
│ ├── announcement.ts # TTS announcement generator (espeak-ng / Kokoro)
│ ├── voicebox.ts # Voicemail box management
│ └── call/
│ └── prompt-cache.ts # Named audio prompt WAV management
@@ -288,13 +286,12 @@ flowchart LR
## 🗣️ Neural TTS
Announcements and voicemail greetings are synthesized using [Kokoro TTS](https://github.com/mzdk100/kokoro) — an 82M parameter neural model running via ONNX Runtime directly in the Rust process:
Voicemail greetings and IVR prompts are synthesized using [Kokoro TTS](https://github.com/mzdk100/kokoro) — an 82M parameter neural model running via ONNX Runtime directly in the Rust process:
- **24 kHz, 16-bit mono** output
- **25+ voice presets** — American/British, male/female (e.g., `af_bella`, `am_adam`, `bf_emma`, `bm_george`)
- **~800ms** synthesis time for a 3-second phrase
- Lazy-loaded on first use — no startup cost if TTS is unused
- Falls back to `espeak-ng` if the ONNX model is not available
---