feat(proxy-engine): add on-demand TTS caching for voicemail and IVR prompts
This commit is contained in:
11
readme.md
11
readme.md
@@ -20,7 +20,7 @@ siprouter sits between your SIP trunk providers and your endpoints — hardware
|
||||
- 🎯 **Adaptive Jitter Buffer** — Per-leg jitter buffering with sequence-based reordering, adaptive depth (60–120ms), Opus PLC for lost packets, and hold/resume detection
|
||||
- 📧 **Voicemail** — Configurable voicemail boxes with TTS greetings, recording, and web playback
|
||||
- 🔢 **IVR Menus** — DTMF-navigable interactive voice response with nested menus, routing actions, and custom prompts
|
||||
- 🗣️ **Neural TTS** — Kokoro-powered announcements and greetings with 25+ voice presets, backed by espeak-ng fallback
|
||||
- 🗣️ **Neural TTS** — Kokoro-powered greetings and IVR prompts with 25+ voice presets
|
||||
- 🎙️ **Call Recording** — Per-source separated WAV recording at 48kHz via tool legs
|
||||
- 🖥️ **Web Dashboard** — Real-time SPA with 9 views: live calls, browser phone, routing, voicemail, IVR, contacts, providers, and streaming logs
|
||||
|
||||
@@ -98,7 +98,6 @@ sequenceDiagram
|
||||
- **Node.js** ≥ 20 with `tsx` globally available
|
||||
- **pnpm** for package management
|
||||
- **Rust** toolchain (for building the proxy engine)
|
||||
- **espeak-ng** (optional, for TTS fallback)
|
||||
|
||||
### Install & Build
|
||||
|
||||
@@ -190,7 +189,7 @@ Create `.nogit/config.json`:
|
||||
|
||||
### TTS Setup (Optional)
|
||||
|
||||
For neural announcements and voicemail greetings, download the Kokoro TTS model:
|
||||
For neural voicemail greetings and IVR prompts, download the Kokoro TTS model:
|
||||
|
||||
```bash
|
||||
mkdir -p .nogit/tts
|
||||
@@ -200,7 +199,7 @@ curl -L -o .nogit/tts/voices.bin \
|
||||
https://github.com/mzdk100/kokoro/releases/download/V1.0/voices.bin
|
||||
```
|
||||
|
||||
Without the model files, TTS falls back to `espeak-ng`. Without either, announcements are skipped — everything else works fine.
|
||||
Without the model files, TTS prompts (IVR menus, voicemail greetings) are skipped — everything else works fine.
|
||||
|
||||
### Run
|
||||
|
||||
@@ -227,7 +226,6 @@ siprouter/
|
||||
│ ├── frontend.ts # Web dashboard HTTP/WS server + REST API
|
||||
│ ├── webrtcbridge.ts # WebRTC signaling layer
|
||||
│ ├── registrar.ts # Browser softphone registration
|
||||
│ ├── announcement.ts # TTS announcement generator (espeak-ng / Kokoro)
|
||||
│ ├── voicebox.ts # Voicemail box management
|
||||
│ └── call/
|
||||
│ └── prompt-cache.ts # Named audio prompt WAV management
|
||||
@@ -288,13 +286,12 @@ flowchart LR
|
||||
|
||||
## 🗣️ Neural TTS
|
||||
|
||||
Announcements and voicemail greetings are synthesized using [Kokoro TTS](https://github.com/mzdk100/kokoro) — an 82M parameter neural model running via ONNX Runtime directly in the Rust process:
|
||||
Voicemail greetings and IVR prompts are synthesized using [Kokoro TTS](https://github.com/mzdk100/kokoro) — an 82M parameter neural model running via ONNX Runtime directly in the Rust process:
|
||||
|
||||
- **24 kHz, 16-bit mono** output
|
||||
- **25+ voice presets** — American/British, male/female (e.g., `af_bella`, `am_adam`, `bf_emma`, `bm_george`)
|
||||
- **~800ms** synthesis time for a 3-second phrase
|
||||
- Lazy-loaded on first use — no startup cost if TTS is unused
|
||||
- Falls back to `espeak-ng` if the ONNX model is not available
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user