feat(readme): expand documentation for voicemail, IVR, audio engine, and API capabilities

This commit is contained in:
2026-04-10 17:25:34 +00:00
parent f543ff1568
commit e4935fbf21
4 changed files with 198 additions and 122 deletions

View File

@@ -1,5 +1,12 @@
# Changelog
## 2026-04-10 - 1.18.0 - feat(readme)
expand documentation for voicemail, IVR, audio engine, and API capabilities
- Updates the feature overview to document voicemail, IVR menus, call recording, enhanced TTS, and the 48kHz float audio engine
- Refreshes the architecture section to describe the TypeScript control plane, Rust proxy-engine data plane, and JSON-over-stdio IPC
- Clarifies REST API and WebSocket coverage with voicemail endpoints, incoming call events, and refined endpoint descriptions
## 2026-04-10 - 1.17.2 - fix(proxy-engine)
use negotiated SDP payload types when wiring SIP legs and enable default nnnoiseless features for telephony denoising

283
readme.md
View File

@@ -1,6 +1,6 @@
# @serve.zone/siprouter
A production-grade **SIP B2BUA + WebRTC bridge** built with TypeScript and Rust. Routes calls between SIP providers, SIP hardware devices, and browser softphones — with real-time codec transcoding, ML noise suppression, neural TTS announcements, and a slick web dashboard.
A production-grade **SIP B2BUA + WebRTC bridge** built with TypeScript and Rust. Routes calls between SIP providers, SIP hardware devices, and browser softphones — with real-time codec transcoding, ML noise suppression, neural TTS, voicemail, IVR menus, and a slick web dashboard.
## Issue Reporting and Security
@@ -12,14 +12,16 @@ For reporting bugs, issues, or security vulnerabilities, please visit [community
siprouter sits between your SIP trunk providers and your endpoints — hardware phones, ATAs, browser softphones — and handles **everything** in between:
- 📞 **SIP B2BUA** — Terminates and re-originates calls with full RFC 3261 dialog state management
- 🌐 **WebRTC Bridge** — Browser-based softphone with bidirectional audio to the SIP network
- 🎛️ **Multi-Provider Trunking** — Register with multiple SIP providers simultaneously (sipgate, easybell, o2, etc.)
- 🔊 **Rust Codec Engine** — Real-time Opus ↔ G.722 ↔ PCMU ↔ PCMA transcoding in native Rust
- 🤖 **ML Noise Suppression** — RNNoise denoiser with per-direction state (to SIP / to browser)
- 🗣️ **Neural TTS** — Kokoro-powered "connecting your call" announcements, pre-encoded for instant playback
- 🔀 **Hub Model Calls** — N-leg calls with dynamic add/remove, transfer, and RTP fan-out
- 🖥 **Web Dashboard** — Real-time SPA with live call monitoring, browser phone, contact management, provider config
- 📞 **SIP B2BUA** — Terminates and re-originates calls with full RFC 3261 dialog state management, digest auth, and SDP negotiation
- 🌐 **WebRTC Bridge** — Browser-based softphone with bidirectional Opus audio to the SIP network
- 🎛️ **Multi-Provider Trunking** — Register with multiple SIP providers simultaneously (sipgate, easybell, etc.) with automatic failover
- 🎧 **48kHz f32 Audio Engine** — High-fidelity internal audio bus at 48kHz/32-bit float with native Opus float encode/decode, FFT-based resampling, and per-leg ML noise suppression
- 🔀 **N-Leg Mix-Minus Mixer** — Conference-grade mixing with dynamic leg add/remove, transfer, and per-source audio separation
- 📧 **Voicemail** — Configurable voicemail boxes with TTS greetings, recording, and web playback
- 🔢 **IVR Menus** — DTMF-navigable interactive voice response with nested menus, routing actions, and custom prompts
- 🗣 **Neural TTS** — Kokoro-powered announcements and greetings with 25+ voice presets, backed by espeak-ng fallback
- 🎙️ **Call Recording** — Per-source separated WAV recording at 48kHz via tool legs
- 🖥️ **Web Dashboard** — Real-time SPA with 9 views: live calls, browser phone, routing, voicemail, IVR, contacts, providers, and streaming logs
---
@@ -35,32 +37,38 @@ siprouter sits between your SIP trunk providers and your endpoints — hardware
┌──────────────────────────────────────┐
│ siprouter │
│ │
┌──────────┐ ┌──────────────────┐
│ Call Hub │ │ Rust Transcoder │
│ │ N legs │──│ Opus/G.722/PCM │ │
│ │ fan-out │ │ + RNNoise │ │
└────┬─────┘ └──────────────────┘
┌────┴─────┐ ┌──────────────────┐
│ SIP Stack│ │ Kokoro TTS │
│ │ Dialog SM│ │ (ONNX Runtime) │ │
└────┬─────┘ └──────────────────┘
┌────┴──────────────────────────┐
│ │ Local Registrar + Provider
│ │ Registration Engine
└───────────────────────────────┘
└──────────┬──────────────┬────────────┘
TypeScript Control Plane
┌────────────────────────────────┐
│ │ Config · WebRTC Signaling │ │
│ │ REST API · Web Dashboard │ │
│ Voicebox Manager · TTS Cache │
└────────────┬───────────────────┘
JSON-over-stdio IPC
┌────────────┴───────────────────┐
│ │ Rust proxy-engine (data plane) │ │
│ │
│ SIP Stack · Dialog SM · Auth
│ Call Manager · N-Leg Mixer │
│ │ 48kHz f32 Bus · RNNoise │ │
│ │ Codec Engine · RTP Port Pool │ │
│ WebRTC Engine · Kokoro TTS │
│ │ Voicemail · IVR · Recording │ │
│ └────┬──────────────────┬────────┘ │
└───────┤──────────────────┤───────────┘
│ │
┌──────┴──────┐ ─────┴──────┐
┌──────┴──────┐ ┌──────┴──────┐
│ SIP Devices │ │ SIP Trunk │
│ (HT801, etc)│ │ Providers │
└─────────────┘ ────────────┘
│ (HT801 etc) │ Providers
└─────────────┘ └─────────────┘
```
### The Hub Model
### 🧠 Key Design Decisions
Every call is a **hub** with N legs. Each leg is either a `SipLeg` (hardware device or provider) or a `WebRtcLeg` (browser). RTP flows through the hub — each leg's received audio is forwarded to all other legs, with codec transcoding handled transparently by the Rust engine.
- **Hub Model** — Every call is a hub with N legs. Each leg is a `SipLeg` (device/provider) or `WebRtcLeg` (browser). Legs can be dynamically added, removed, or transferred without tearing down the call.
- **Rust Data Plane** — All SIP protocol handling, codec transcoding, mixing, and RTP I/O runs in native Rust for real-time performance. TypeScript handles config, signaling, REST API, and dashboard.
- **48kHz f32 Internal Bus** — Audio is processed at maximum quality internally. Encoding/decoding to wire format (G.722, PCMU, Opus) happens solely at the leg boundary.
- **Per-Session Codec Isolation** — Each call leg gets its own encoder/decoder/resampler/denoiser state — no cross-call corruption.
- **SDP Codec Negotiation** — Outbound encoding uses the codec actually negotiated in SDP answers, not just the first offered codec.
---
@@ -70,15 +78,16 @@ Every call is a **hub** with N legs. Each leg is either a `SipLeg` (hardware dev
- **Node.js** ≥ 20 with `tsx` globally available
- **pnpm** for package management
- **Rust** toolchain (for building the codec engine and TTS)
- **Rust** toolchain (for building the proxy engine)
- **espeak-ng** (optional, for TTS fallback)
### Install & Build
```bash
# Clone and install
# Clone and install dependencies
pnpm install
# Build the Rust binaries (opus-codec + tts-engine)
# Build the Rust proxy-engine binary
pnpm run buildRust
# Bundle the web frontend
@@ -87,57 +96,92 @@ pnpm run bundle
### Configuration
Create `.nogit/config.json` with your setup:
Create `.nogit/config.json`:
```jsonc
{
"proxy": {
"lanIp": "192.168.1.100", // Your server's LAN IP
"lanPort": 5070, // SIP signaling port
"rtpPortRange": [20000, 20200],// RTP relay port pool (even ports)
"webUiPort": 3060 // Dashboard port
"publicIpSeed": "stun.example.com", // STUN server for public IP discovery
"rtpPortRange": { "min": 20000, "max": 20200 }, // RTP port pool (even ports)
"webUiPort": 3060 // Dashboard + REST API port
},
"providers": [
{
"id": "my-trunk",
"name": "My SIP Provider",
"host": "sip.provider.com",
"port": 5060,
"displayName": "My SIP Provider",
"domain": "sip.provider.com",
"outboundProxy": { "address": "sip.provider.com", "port": 5060 },
"username": "user",
"password": "pass",
"codecs": ["G.722", "PCMA", "PCMU"],
"registerExpiry": 3600
"codecs": [9, 0, 8, 101], // G.722, PCMU, PCMA, telephone-event
"registerIntervalSec": 300
}
],
"devices": [
{
"id": "desk-phone",
"name": "Desk Phone",
"type": "sip"
"displayName": "Desk Phone",
"expectedAddress": "192.168.1.50",
"extension": "100"
}
],
"routing": {
"inbound": {
"default": { "target": "all-devices", "ringBrowser": true }
"routes": [
{
"id": "inbound-default",
"name": "Ring all devices",
"priority": 100,
"direction": "inbound",
"match": {},
"action": {
"targets": ["desk-phone"],
"ringBrowsers": true,
"voicemailBox": "main",
"noAnswerTimeout": 25
}
},
{
"id": "outbound-default",
"name": "Route via trunk",
"priority": 100,
"direction": "outbound",
"match": {},
"action": { "provider": "my-trunk" }
}
]
},
"voiceboxes": [
{
"id": "main",
"enabled": true,
"greetingText": "Please leave a message after the beep.",
"greetingVoice": "af_bella",
"noAnswerTimeoutSec": 25,
"maxRecordingSec": 120,
"maxMessages": 50
}
],
"contacts": [
{ "id": "1", "name": "Alice", "number": "+491234567890", "starred": true }
]
}
```
### TTS Setup (Optional)
For neural "connecting your call" announcements, download the Kokoro TTS model:
For neural announcements and voicemail greetings, download the Kokoro TTS model:
```bash
mkdir -p .nogit/tts
# Download the full-quality model (310MB) + voices (27MB)
curl -L -o .nogit/tts/kokoro-v1.0.onnx \
https://github.com/mzdk100/kokoro/releases/download/V1.0/kokoro-v1.0.onnx
curl -L -o .nogit/tts/voices.bin \
https://github.com/mzdk100/kokoro/releases/download/V1.0/voices.bin
```
If the model files aren't present, the announcement feature is simply disabled — everything else works fine.
Without the model files, TTS falls back to `espeak-ng`. Without either, announcements are skipped — everything else works fine.
### Run
@@ -145,7 +189,7 @@ If the model files aren't present, the announcement feature is simply disabled
pnpm start
```
The SIP proxy starts on the configured port and the web dashboard is available at `http://<your-ip>:3060`.
The SIP proxy starts on the configured port and the web dashboard is available at `https://<your-ip>:3060`.
### HTTPS (Optional)
@@ -157,68 +201,91 @@ Place `cert.pem` and `key.pem` in `.nogit/` for TLS on the dashboard.
```
siprouter/
├── ts/ # TypeScript source
├── ts/ # TypeScript control plane
│ ├── sipproxy.ts # Main entry — bootstraps everything
│ ├── config.ts # Config loader & validation
│ ├── registrar.ts # Local SIP registrar for devices
│ ├── providerstate.ts # Per-provider upstream registration engine
│ ├── proxybridge.ts # Rust proxy-engine IPC bridge (smartrust)
│ ├── frontend.ts # Web dashboard HTTP/WS server + REST API
│ ├── webrtcbridge.ts # WebRTC signaling layer
│ ├── opusbridge.ts # Rust IPC bridge (smartrust)
│ ├── codec.ts # High-level RTP transcoding interface
│ ├── announcement.ts # Neural TTS announcement generator
── sip/ # Zero-dependency SIP protocol library
── message.ts # SIP message parser/builder/mutator
│ ├── dialog.ts # RFC 3261 dialog state machine
│ │ ├── helpers.ts # SDP builder, digest auth, codec registry
│ │ └── rewrite.ts # SIP URI + SDP body rewriting
│ └── call/ # Hub-model call management
│ ├── call-manager.ts # Central registry, factory, routing
│ ├── call.ts # Call hub — owns N legs, media fan-out
│ ├── sip-leg.ts # SIP device/provider connection
│ ├── webrtc-leg.ts # Browser WebRTC connection
│ └── rtp-port-pool.ts # UDP port allocation
│ ├── registrar.ts # Browser softphone registration
│ ├── announcement.ts # TTS announcement generator (espeak-ng / Kokoro)
│ ├── voicebox.ts # Voicemail box management
── call/
── prompt-cache.ts # Named audio prompt WAV management
├── ts_web/ # Web frontend (Lit-based SPA)
│ ├── elements/ # Web components (dashboard, phone, etc.)
│ ├── elements/ # Web components (9 dashboard views)
│ └── state/ # App state, WebRTC client, notifications
├── rust/ # Rust workspace
├── rust/ # Rust workspace (the data plane)
│ └── crates/
│ ├── opus-codec/ # Real-time audio transcoder (Opus/G.722/PCM)
── tts-engine/ # Kokoro neural TTS CLI
│ ├── codec-lib/ # Audio codec library (Opus/G.722/PCMU/PCMA)
── sip-proto/ # Zero-dependency SIP protocol library
│ └── proxy-engine/ # Main binary — SIP engine + mixer + RTP
├── html/ # Static HTML shell
├── .nogit/ # Secrets, config, models (gitignored)
└── dist_rust/ # Compiled Rust binaries (gitignored)
├── .nogit/ # Secrets, config, TTS models (gitignored)
└── dist_rust/ # Compiled Rust binary (gitignored)
```
---
## 🎧 Codec Engine (Rust)
## 🎧 Audio Engine (Rust)
The `opus-codec` binary handles all real-time audio processing via a JSON-over-stdio IPC protocol:
The `proxy-engine` binary handles all real-time audio processing with a **48kHz f32 internal bus** — encoding and decoding happens only at leg boundaries.
| Codec | Payload Type | Sample Rate | Use Case |
|-------|-------------|-------------|----------|
| **Opus** | 111 | 48 kHz | WebRTC browsers |
| **G.722** | 9 | 16 kHz | HD SIP devices |
### Supported Codecs
| Codec | PT | Native Rate | Use Case |
|-------|:--:|:-----------:|----------|
| **Opus** | 111 | 48 kHz | WebRTC browsers (native float encode/decode — zero i16 quantization) |
| **G.722** | 9 | 16 kHz | HD SIP devices & providers |
| **PCMU** (G.711 µ-law) | 0 | 8 kHz | Legacy SIP |
| **PCMA** (G.711 A-law) | 8 | 8 kHz | Legacy SIP |
**Features:**
- Per-call isolated codec sessions (no cross-call state corruption)
- FFT-based sample rate conversion via `rubato`
- **RNNoise ML noise suppression** with per-direction state — denoises audio flowing to SIP separately from audio flowing to the browser
- Raw PCM encoding for TTS frame processing
### Audio Pipeline
```
Inbound: Wire RTP → Decode → Resample to 48kHz → Denoise (RNNoise) → Mix Bus
Outbound: Mix Bus → Mix-Minus → Resample to codec rate → Encode → Wire RTP
```
- **FFT-based resampling** via `rubato` — high-quality sinc interpolation with cached resampler state for seamless inter-frame continuity
- **ML noise suppression** via `nnnoiseless` (RNNoise) — per-leg inbound denoising with SIMD acceleration (AVX/SSE). Skipped for WebRTC legs (browsers already denoise via getUserMedia)
- **Mix-minus mixing** — each participant hears everyone except themselves, accumulated in f64 precision
- **In-tick packet reorder** — inbound RTP packets are sorted by sequence number before decoding, protecting G.722 ADPCM state from out-of-order delivery
- **RFC 3550 compliant header parsing** — properly handles CSRC lists and header extensions
---
## 🗣️ Neural TTS (Rust)
## 🗣️ Neural TTS
The `tts-engine` binary uses [Kokoro TTS](https://github.com/mzdk100/kokoro) (82M parameter neural model) to synthesize announcements at startup:
Announcements and voicemail greetings are synthesized using [Kokoro TTS](https://github.com/mzdk100/kokoro) — an 82M parameter neural model running via ONNX Runtime directly in the Rust process:
- **24 kHz, 16-bit mono** output
- **25+ voice presets** — American/British, male/female (e.g., `af_bella`, `am_adam`, `bf_emma`, `bm_george`)
- **~800ms** synthesis time for a 3-second announcement
- Pre-encoded to G.722 + Opus for zero-latency RTP playback during call setup
- **~800ms** synthesis time for a 3-second phrase
- Lazy-loaded on first use — no startup cost if TTS is unused
- Falls back to `espeak-ng` if the ONNX model is not available
---
## 📧 Voicemail
- Configurable voicemail boxes with custom TTS greetings
- Automatic routing on no-answer timeout
- Recording with configurable max duration and message count
- Web dashboard playback and management
- WAV storage in `.nogit/voicemail/`
---
## 🔢 IVR (Interactive Voice Response)
- DTMF-navigable menus with configurable entries
- Actions: route to extension, route to voicemail, transfer, submenu, hangup, repeat prompt
- Custom TTS prompts per menu
- Nested menu support
---
@@ -228,25 +295,32 @@ The `tts-engine` binary uses [Kokoro TTS](https://github.com/mzdk100/kokoro) (82
| View | Description |
|------|-------------|
| **Overview** | Stats tiles — uptime, providers, devices, active calls |
| **Calls** | Active calls with leg details, codec info, packet counters. Add/remove legs, transfer, hangup |
| **Phone** | Browser softphone — mic/speaker selection, audio meters, dial pad, incoming call popup |
| **Contacts** | Contact management with click-to-call |
| **Providers** | SIP trunk config with registration status |
| **Log** | Live streaming log viewer |
| 📊 **Overview** | Stats tiles — uptime, providers, devices, active calls |
| 📞 **Calls** | Active calls with leg details, codec info, add/remove legs, transfer, hangup |
| ☎️ **Phone** | Browser softphone — mic/speaker selection, audio meters, dial pad, incoming call popup |
| 🔀 **Routes** | Routing rule management — match/action model with priority |
| 📧 **Voicemail** | Voicemail box management + message playback |
| 🔢 **IVR** | IVR menu builder — DTMF entries, TTS prompts, nested menus |
| 👤 **Contacts** | Contact management with click-to-call |
| 🔌 **Providers** | SIP trunk configuration and registration status |
| 📋 **Log** | Live streaming log viewer |
### REST API
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/status` | GET | Full system status (providers, devices, calls) |
| `/api/status` | GET | Full system status (providers, devices, calls, history) |
| `/api/call` | POST | Originate a call |
| `/api/hangup` | POST | Hang up a call |
| `/api/call/:id/addleg` | POST | Add a leg to an active call |
| `/api/call/:id/addexternal` | POST | Add an external participant |
| `/api/call/:id/addleg` | POST | Add a device leg to an active call |
| `/api/call/:id/addexternal` | POST | Add an external participant via provider |
| `/api/call/:id/removeleg` | POST | Remove a leg from a call |
| `/api/transfer` | POST | Transfer a call |
| `/api/config` | GET/POST | Read or update configuration (hot-reload) |
| `/api/config` | GET | Read current configuration |
| `/api/config` | POST | Update configuration (hot-reload) |
| `/api/voicemail/:box` | GET | List voicemail messages |
| `/api/voicemail/:box/:id` | DELETE | Delete a voicemail message |
| `/api/voicemail/:box/:id/audio` | GET | Stream voicemail audio |
### WebSocket Events
@@ -255,6 +329,8 @@ Connect to `/ws` for real-time push:
```jsonc
{ "type": "status", "data": { ... } } // Full status snapshot (1s interval)
{ "type": "log", "data": { "message": "..." } } // Log lines in real-time
{ "type": "incoming_call", "data": { ... } } // Incoming call notification
{ "type": "call_ended", "data": { ... } } // Call ended notification
```
---
@@ -264,7 +340,7 @@ Connect to `/ws` for real-time push:
| Port | Protocol | Purpose |
|------|----------|---------|
| 5070 (configurable) | UDP | SIP signaling |
| 2000020200 (configurable) | UDP | RTP relay (even ports, per-call allocation) |
| 2000020200 (configurable) | UDP | RTP media (even ports, per-call allocation) |
| 3060 (configurable) | TCP | Web dashboard + WebSocket + REST API |
---
@@ -275,23 +351,16 @@ Connect to `/ws` for real-time push:
# Start in dev mode
pnpm start
# Build Rust crates
# Build Rust proxy-engine
pnpm run buildRust
# Bundle web frontend
pnpm run bundle
# Restart background server (build + bundle + restart)
# Build + bundle + restart background server
pnpm run restartBackground
```
### Key Design Decisions
- **Hub Model** — Calls are N-leg hubs, not point-to-point. This enables multi-party, dynamic leg manipulation, and transfer without tearing down the call.
- **Zero-dependency SIP library** — `ts/sip/` is a pure data-level SIP stack (parse/build/mutate/serialize). No transport or timer logic — those live in the application layer.
- **Rust for the hot path** — Codec transcoding and noise suppression run in native Rust for real-time performance. TypeScript handles signaling and orchestration.
- **Per-session codec isolation** — Each call gets its own Opus/G.722 encoder/decoder state in the Rust process, preventing stateful codec prediction from leaking between concurrent calls.
---
## License and Legal Information

View File

@@ -3,6 +3,6 @@
*/
export const commitinfo = {
name: 'siprouter',
version: '1.17.2',
version: '1.18.0',
description: 'undefined'
}

View File

@@ -3,6 +3,6 @@
*/
export const commitinfo = {
name: 'siprouter',
version: '1.17.2',
version: '1.18.0',
description: 'undefined'
}