A production-grade **SIP B2BUA + WebRTC bridge** built with TypeScript and Rust. Routes calls between SIP providers, SIP hardware devices, and browser softphones — with real-time codec transcoding, ML noise suppression, neural TTS announcements, and a slick web dashboard.
A production-grade **SIP B2BUA + WebRTC bridge** built with TypeScript and Rust. Routes calls between SIP providers, SIP hardware devices, and browser softphones — with real-time codec transcoding, ML noise suppression, neural TTS, voicemail, IVR menus, and a slick web dashboard.
## Issue Reporting and Security
@@ -12,14 +12,16 @@ For reporting bugs, issues, or security vulnerabilities, please visit [community
siprouter sits between your SIP trunk providers and your endpoints — hardware phones, ATAs, browser softphones — and handles **everything** in between:
- 📞 **SIP B2BUA** — Terminates and re-originates calls with full RFC 3261 dialog state management
- 🌐 **WebRTC Bridge** — Browser-based softphone with bidirectional audio to the SIP network
-🔀**Hub Model Calls** — N-leg calls with dynamic add/remove, transfer, and RTP fan-out
-🖥️**Web Dashboard** — Real-time SPA with live call monitoring, browser phone, contact management, provider config
- 📞 **SIP B2BUA** — Terminates and re-originates calls with full RFC 3261 dialog state management, digest auth, and SDP negotiation
- 🌐 **WebRTC Bridge** — Browser-based softphone with bidirectional Opus audio to the SIP network
- 🎛️ **Multi-Provider Trunking** — Register with multiple SIP providers simultaneously (sipgate, easybell, etc.) with automatic failover
-🎧**48kHz f32 Audio Engine** — High-fidelity internal audio bus at 48kHz/32-bit float with native Opus float encode/decode, FFT-based resampling, and per-leg ML noise suppression
-🔀**N-Leg Mix-Minus Mixer** — Conference-grade mixing with dynamic leg add/remove, transfer, and per-source audio separation
-📧**Voicemail** — Configurable voicemail boxes with TTS greetings, recording, and web playback
-🔢**IVR Menus** — DTMF-navigable interactive voice response with nested menus, routing actions, and custom prompts
-🗣️**Neural TTS** — Kokoro-powered announcements and greetings with 25+ voice presets, backed by espeak-ng fallback
- 🎙️ **Call Recording** — Per-source separated WAV recording at 48kHz via tool legs
- 🖥️ **Web Dashboard** — Real-time SPA with 9 views: live calls, browser phone, routing, voicemail, IVR, contacts, providers, and streaming logs
---
@@ -35,32 +37,38 @@ siprouter sits between your SIP trunk providers and your endpoints — hardware
┌──────────────────────────────────────┐
│ siprouter │
│ │
│ ┌──────────┐ ┌──────────────────┐ │
│ │ Call Hub │ │ Rust Transcoder │ │
│ │ N legs │──│ Opus/G.722/PCM │ │
│ │ fan-out │ │ + RNNoise │ │
│ └────┬─────┘ └──────────────────┘ │
│ │ │
│ ┌────┴─────┐ ┌──────────────────┐ │
│ │ SIP Stack│ │ Kokoro TTS │ │
│ │ Dialog SM│ │ (ONNX Runtime) │ │
│ └────┬─────┘ └──────────────────┘ │
│ │ │
│ ┌────┴──────────────────────────┐ │
│ │ Local Registrar + Provider │ │
│ │ Registration Engine │ │
│ └───────────────────────────────┘ │
└──────────┬──────────────┬────────────┘
│ │
┌──────┴──────┐ ┌─────┴──────┐
│ SIP Devices │ │ SIP Trunk │
│ (HT801, etc)│ │ Providers │
└─────────────┘ └────────────┘
│ TypeScript Control Plane │
│ ┌────────────────────────────────┐ │
│ │ Config · WebRTC Signaling │ │
│ │ REST API · Web Dashboard │ │
│ │ Voicebox Manager · TTS Cache │ │
│ └────────────┬───────────────────┘ │
│ JSON-over-stdio IPC │
│ ┌────────────┴───────────────────┐ │
│ │ Rust proxy-engine (data plane) │ │
│ │ │ │
│ │ SIP Stack · Dialog SM · Auth│ │
│ │ Call Manager · N-Leg Mixer │ │
│ │ 48kHz f32 Bus · RNNoise │ │
│ │ Codec Engine · RTP Port Pool │ │
│ │ WebRTC Engine · Kokoro TTS │ │
│ │ Voicemail · IVR · Recording │ │
│ └────┬──────────────────┬────────┘ │
└───────┤──────────────────┤───────────┘
│ │
┌──────┴──────┐ ┌──────┴──────┐
│ SIP Devices │ │ SIP Trunk │
│ (HT801 etc) │ │ Providers │
└─────────────┘ └─────────────┘
```
### The Hub Model
### 🧠 Key Design Decisions
Every call is a **hub** with N legs. Each leg is either a `SipLeg` (hardware device or provider) or a `WebRtcLeg` (browser). RTP flows through the hub — each leg's received audio is forwarded to all other legs, with codec transcoding handled transparently by the Rust engine.
- **Hub Model** — Every call is a hub with N legs. Each leg is a `SipLeg` (device/provider) or `WebRtcLeg` (browser). Legs can be dynamically added, removed, or transferred without tearing down the call.
- **Rust Data Plane** — All SIP protocol handling, codec transcoding, mixing, and RTP I/O runs in native Rust for real-time performance. TypeScript handles config, signaling, REST API, and dashboard.
- **48kHz f32 Internal Bus** — Audio is processed at maximum quality internally. Encoding/decoding to wire format (G.722, PCMU, Opus) happens solely at the leg boundary.
- **Per-Session Codec Isolation** — Each call leg gets its own encoder/decoder/resampler/denoiser state — no cross-call corruption.
- **SDP Codec Negotiation** — Outbound encoding uses the codec actually negotiated in SDP answers, not just the first offered codec.
---
@@ -70,15 +78,16 @@ Every call is a **hub** with N legs. Each leg is either a `SipLeg` (hardware dev
- **Node.js** ≥ 20 with `tsx` globally available
- **pnpm** for package management
- **Rust** toolchain (for building the codec engine and TTS)
- **Rust** toolchain (for building the proxy engine)
- **espeak-ng** (optional, for TTS fallback)
### Install & Build
```bash
# Clone and install
# Clone and install dependencies
pnpm install
# Build the Rust binaries (opus-codec + tts-engine)
# Build the Rust proxy-engine binary
pnpm run buildRust
# Bundle the web frontend
@@ -87,57 +96,92 @@ pnpm run bundle
### Configuration
Create `.nogit/config.json` with your setup:
Create `.nogit/config.json`:
```jsonc
{
"proxy":{
"lanIp":"192.168.1.100",// Your server's LAN IP
"lanPort":5070,// SIP signaling port
"rtpPortRange":[20000,20200],// RTP relay port pool (even ports)
"webUiPort":3060// Dashboard port
"lanIp":"192.168.1.100",// Your server's LAN IP
"lanPort":5070,// SIP signaling port
"publicIpSeed":"stun.example.com",// STUN server for public IP discovery
"rtpPortRange":{"min":20000,"max":20200},// RTP port pool (even ports)
The `opus-codec` binary handles all real-time audio processing via a JSON-over-stdio IPC protocol:
The `proxy-engine` binary handles all real-time audio processing with a **48kHz f32 internal bus** — encoding and decoding happens only at leg boundaries.
- Per-call isolated codec sessions (no cross-call state corruption)
- FFT-based sample rate conversion via `rubato`
- **RNNoise ML noise suppression** with per-direction state — denoises audio flowing to SIP separately from audio flowing to the browser
- Raw PCM encoding for TTS frame processing
### Audio Pipeline
```
Inbound: Wire RTP → Decode → Resample to 48kHz → Denoise (RNNoise) → Mix Bus
Outbound: Mix Bus → Mix-Minus → Resample to codec rate → Encode → Wire RTP
```
- **FFT-based resampling** via `rubato` — high-quality sinc interpolation with cached resampler state for seamless inter-frame continuity
- **ML noise suppression** via `nnnoiseless` (RNNoise) — per-leg inbound denoising with SIMD acceleration (AVX/SSE). Skipped for WebRTC legs (browsers already denoise via getUserMedia)
- **Mix-minus mixing** — each participant hears everyone except themselves, accumulated in f64 precision
- **In-tick packet reorder** — inbound RTP packets are sorted by sequence number before decoding, protecting G.722 ADPCM state from out-of-order delivery
The `tts-engine` binary uses [Kokoro TTS](https://github.com/mzdk100/kokoro) (82M parameter neural model) to synthesize announcements at startup:
Announcements and voicemail greetings are synthesized using [Kokoro TTS](https://github.com/mzdk100/kokoro) — an 82M parameter neural model running via ONNX Runtime directly in the Rust process:
| 3060 (configurable) | TCP | Web dashboard + WebSocket + REST API |
---
@@ -275,23 +351,16 @@ Connect to `/ws` for real-time push:
# Start in dev mode
pnpm start
# Build Rust crates
# Build Rust proxy-engine
pnpm run buildRust
# Bundle web frontend
pnpm run bundle
# Restart background server (build + bundle + restart)
# Build + bundle + restart background server
pnpm run restartBackground
```
### Key Design Decisions
- **Hub Model** — Calls are N-leg hubs, not point-to-point. This enables multi-party, dynamic leg manipulation, and transfer without tearing down the call.
- **Zero-dependency SIP library** — `ts/sip/` is a pure data-level SIP stack (parse/build/mutate/serialize). No transport or timer logic — those live in the application layer.
- **Rust for the hot path** — Codec transcoding and noise suppression run in native Rust for real-time performance. TypeScript handles signaling and orchestration.
- **Per-session codec isolation** — Each call gets its own Opus/G.722 encoder/decoder state in the Rust process, preventing stateful codec prediction from leaking between concurrent calls.
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.