2026-04-10 17:25:34 +00:00

@serve.zone/siprouter

A production-grade SIP B2BUA + WebRTC bridge built with TypeScript and Rust. Routes calls between SIP providers, SIP hardware devices, and browser softphones — with real-time codec transcoding, ML noise suppression, neural TTS, voicemail, IVR menus, and a slick web dashboard.

Issue Reporting and Security

For reporting bugs, issues, or security vulnerabilities, please visit community.foss.global/. This is the central community hub for all issue reporting. Developers who sign and comply with our contribution agreement and go through identification can also get a code.foss.global/ account to submit Pull Requests directly.


🔥 What It Does

siprouter sits between your SIP trunk providers and your endpoints — hardware phones, ATAs, browser softphones — and handles everything in between:

  • 📞 SIP B2BUA — Terminates and re-originates calls with full RFC 3261 dialog state management, digest auth, and SDP negotiation
  • 🌐 WebRTC Bridge — Browser-based softphone with bidirectional Opus audio to the SIP network
  • 🎛️ Multi-Provider Trunking — Register with multiple SIP providers simultaneously (sipgate, easybell, etc.) with automatic failover
  • 🎧 48kHz f32 Audio Engine — High-fidelity internal audio bus at 48kHz/32-bit float with native Opus float encode/decode, FFT-based resampling, and per-leg ML noise suppression
  • 🔀 N-Leg Mix-Minus Mixer — Conference-grade mixing with dynamic leg add/remove, transfer, and per-source audio separation
  • 📧 Voicemail — Configurable voicemail boxes with TTS greetings, recording, and web playback
  • 🔢 IVR Menus — DTMF-navigable interactive voice response with nested menus, routing actions, and custom prompts
  • 🗣️ Neural TTS — Kokoro-powered announcements and greetings with 25+ voice presets, backed by espeak-ng fallback
  • 🎙️ Call Recording — Per-source separated WAV recording at 48kHz via tool legs
  • 🖥️ Web Dashboard — Real-time SPA with 9 views: live calls, browser phone, routing, voicemail, IVR, contacts, providers, and streaming logs

🏗️ Architecture

┌─────────────────────────────────────┐
│           Browser Softphone          │
│     (WebRTC via WebSocket signaling) │
└──────────────┬──────────────────────┘
               │ Opus/WebRTC
               ▼
┌──────────────────────────────────────┐
│            siprouter                  │
│                                      │
│  TypeScript Control Plane            │
│  ┌────────────────────────────────┐  │
│  │ Config · WebRTC Signaling      │  │
│  │ REST API · Web Dashboard       │  │
│  │ Voicebox Manager · TTS Cache   │  │
│  └────────────┬───────────────────┘  │
│          JSON-over-stdio IPC         │
│  ┌────────────┴───────────────────┐  │
│  │ Rust proxy-engine (data plane) │  │
│  │                                │  │
│  │ SIP Stack · Dialog SM · Auth   │  │
│  │ Call Manager · N-Leg Mixer     │  │
│  │ 48kHz f32 Bus · RNNoise       │  │
│  │ Codec Engine · RTP Port Pool   │  │
│  │ WebRTC Engine · Kokoro TTS     │  │
│  │ Voicemail · IVR · Recording    │  │
│  └────┬──────────────────┬────────┘  │
└───────┤──────────────────┤───────────┘
        │                  │
 ┌──────┴──────┐    ┌──────┴──────┐
 │ SIP Devices │    │ SIP Trunk   │
 │ (HT801 etc) │    │ Providers   │
 └─────────────┘    └─────────────┘

🧠 Key Design Decisions

  • Hub Model — Every call is a hub with N legs. Each leg is a SipLeg (device/provider) or WebRtcLeg (browser). Legs can be dynamically added, removed, or transferred without tearing down the call.
  • Rust Data Plane — All SIP protocol handling, codec transcoding, mixing, and RTP I/O runs in native Rust for real-time performance. TypeScript handles config, signaling, REST API, and dashboard.
  • 48kHz f32 Internal Bus — Audio is processed at maximum quality internally. Encoding/decoding to wire format (G.722, PCMU, Opus) happens solely at the leg boundary.
  • Per-Session Codec Isolation — Each call leg gets its own encoder/decoder/resampler/denoiser state — no cross-call corruption.
  • SDP Codec Negotiation — Outbound encoding uses the codec actually negotiated in SDP answers, not just the first offered codec.

🚀 Getting Started

Prerequisites

  • Node.js ≥ 20 with tsx globally available
  • pnpm for package management
  • Rust toolchain (for building the proxy engine)
  • espeak-ng (optional, for TTS fallback)

Install & Build

# Clone and install dependencies
pnpm install

# Build the Rust proxy-engine binary
pnpm run buildRust

# Bundle the web frontend
pnpm run bundle

Configuration

Create .nogit/config.json:

{
  "proxy": {
    "lanIp": "192.168.1.100",          // Your server's LAN IP
    "lanPort": 5070,                    // SIP signaling port
    "publicIpSeed": "stun.example.com", // STUN server for public IP discovery
    "rtpPortRange": { "min": 20000, "max": 20200 }, // RTP port pool (even ports)
    "webUiPort": 3060                   // Dashboard + REST API port
  },
  "providers": [
    {
      "id": "my-trunk",
      "displayName": "My SIP Provider",
      "domain": "sip.provider.com",
      "outboundProxy": { "address": "sip.provider.com", "port": 5060 },
      "username": "user",
      "password": "pass",
      "codecs": [9, 0, 8, 101],        // G.722, PCMU, PCMA, telephone-event
      "registerIntervalSec": 300
    }
  ],
  "devices": [
    {
      "id": "desk-phone",
      "displayName": "Desk Phone",
      "expectedAddress": "192.168.1.50",
      "extension": "100"
    }
  ],
  "routing": {
    "routes": [
      {
        "id": "inbound-default",
        "name": "Ring all devices",
        "priority": 100,
        "direction": "inbound",
        "match": {},
        "action": {
          "targets": ["desk-phone"],
          "ringBrowsers": true,
          "voicemailBox": "main",
          "noAnswerTimeout": 25
        }
      },
      {
        "id": "outbound-default",
        "name": "Route via trunk",
        "priority": 100,
        "direction": "outbound",
        "match": {},
        "action": { "provider": "my-trunk" }
      }
    ]
  },
  "voiceboxes": [
    {
      "id": "main",
      "enabled": true,
      "greetingText": "Please leave a message after the beep.",
      "greetingVoice": "af_bella",
      "noAnswerTimeoutSec": 25,
      "maxRecordingSec": 120,
      "maxMessages": 50
    }
  ],
  "contacts": [
    { "id": "1", "name": "Alice", "number": "+491234567890", "starred": true }
  ]
}

TTS Setup (Optional)

For neural announcements and voicemail greetings, download the Kokoro TTS model:

mkdir -p .nogit/tts
curl -L -o .nogit/tts/kokoro-v1.0.onnx \
  https://github.com/mzdk100/kokoro/releases/download/V1.0/kokoro-v1.0.onnx
curl -L -o .nogit/tts/voices.bin \
  https://github.com/mzdk100/kokoro/releases/download/V1.0/voices.bin

Without the model files, TTS falls back to espeak-ng. Without either, announcements are skipped — everything else works fine.

Run

pnpm start

The SIP proxy starts on the configured port and the web dashboard is available at https://<your-ip>:3060.

HTTPS (Optional)

Place cert.pem and key.pem in .nogit/ for TLS on the dashboard.


📂 Project Structure

siprouter/
├── ts/                            # TypeScript control plane
│   ├── sipproxy.ts                # Main entry — bootstraps everything
│   ├── config.ts                  # Config loader & validation
│   ├── proxybridge.ts             # Rust proxy-engine IPC bridge (smartrust)
│   ├── frontend.ts                # Web dashboard HTTP/WS server + REST API
│   ├── webrtcbridge.ts            # WebRTC signaling layer
│   ├── registrar.ts               # Browser softphone registration
│   ├── announcement.ts            # TTS announcement generator (espeak-ng / Kokoro)
│   ├── voicebox.ts                # Voicemail box management
│   └── call/
│       └── prompt-cache.ts        # Named audio prompt WAV management
│
├── ts_web/                        # Web frontend (Lit-based SPA)
│   ├── elements/                  # Web components (9 dashboard views)
│   └── state/                     # App state, WebRTC client, notifications
│
├── rust/                          # Rust workspace (the data plane)
│   └── crates/
│       ├── codec-lib/             # Audio codec library (Opus/G.722/PCMU/PCMA)
│       ├── sip-proto/             # Zero-dependency SIP protocol library
│       └── proxy-engine/          # Main binary — SIP engine + mixer + RTP
│
├── html/                          # Static HTML shell
├── .nogit/                        # Secrets, config, TTS models (gitignored)
└── dist_rust/                     # Compiled Rust binary (gitignored)

🎧 Audio Engine (Rust)

The proxy-engine binary handles all real-time audio processing with a 48kHz f32 internal bus — encoding and decoding happens only at leg boundaries.

Supported Codecs

Codec PT Native Rate Use Case
Opus 111 48 kHz WebRTC browsers (native float encode/decode — zero i16 quantization)
G.722 9 16 kHz HD SIP devices & providers
PCMU (G.711 µ-law) 0 8 kHz Legacy SIP
PCMA (G.711 A-law) 8 8 kHz Legacy SIP

Audio Pipeline

Inbound:   Wire RTP → Decode → Resample to 48kHz → Denoise (RNNoise) → Mix Bus
Outbound:  Mix Bus → Mix-Minus → Resample to codec rate → Encode → Wire RTP
  • FFT-based resampling via rubato — high-quality sinc interpolation with cached resampler state for seamless inter-frame continuity
  • ML noise suppression via nnnoiseless (RNNoise) — per-leg inbound denoising with SIMD acceleration (AVX/SSE). Skipped for WebRTC legs (browsers already denoise via getUserMedia)
  • Mix-minus mixing — each participant hears everyone except themselves, accumulated in f64 precision
  • In-tick packet reorder — inbound RTP packets are sorted by sequence number before decoding, protecting G.722 ADPCM state from out-of-order delivery
  • RFC 3550 compliant header parsing — properly handles CSRC lists and header extensions

🗣️ Neural TTS

Announcements and voicemail greetings are synthesized using Kokoro TTS — an 82M parameter neural model running via ONNX Runtime directly in the Rust process:

  • 24 kHz, 16-bit mono output
  • 25+ voice presets — American/British, male/female (e.g., af_bella, am_adam, bf_emma, bm_george)
  • ~800ms synthesis time for a 3-second phrase
  • Lazy-loaded on first use — no startup cost if TTS is unused
  • Falls back to espeak-ng if the ONNX model is not available

📧 Voicemail

  • Configurable voicemail boxes with custom TTS greetings
  • Automatic routing on no-answer timeout
  • Recording with configurable max duration and message count
  • Web dashboard playback and management
  • WAV storage in .nogit/voicemail/

🔢 IVR (Interactive Voice Response)

  • DTMF-navigable menus with configurable entries
  • Actions: route to extension, route to voicemail, transfer, submenu, hangup, repeat prompt
  • Custom TTS prompts per menu
  • Nested menu support

🌐 Web Dashboard & REST API

Dashboard Views

View Description
📊 Overview Stats tiles — uptime, providers, devices, active calls
📞 Calls Active calls with leg details, codec info, add/remove legs, transfer, hangup
☎️ Phone Browser softphone — mic/speaker selection, audio meters, dial pad, incoming call popup
🔀 Routes Routing rule management — match/action model with priority
📧 Voicemail Voicemail box management + message playback
🔢 IVR IVR menu builder — DTMF entries, TTS prompts, nested menus
👤 Contacts Contact management with click-to-call
🔌 Providers SIP trunk configuration and registration status
📋 Log Live streaming log viewer

REST API

Endpoint Method Description
/api/status GET Full system status (providers, devices, calls, history)
/api/call POST Originate a call
/api/hangup POST Hang up a call
/api/call/:id/addleg POST Add a device leg to an active call
/api/call/:id/addexternal POST Add an external participant via provider
/api/call/:id/removeleg POST Remove a leg from a call
/api/transfer POST Transfer a call
/api/config GET Read current configuration
/api/config POST Update configuration (hot-reload)
/api/voicemail/:box GET List voicemail messages
/api/voicemail/:box/:id DELETE Delete a voicemail message
/api/voicemail/:box/:id/audio GET Stream voicemail audio

WebSocket Events

Connect to /ws for real-time push:

{ "type": "status", "data": { ... } }           // Full status snapshot (1s interval)
{ "type": "log", "data": { "message": "..." } } // Log lines in real-time
{ "type": "incoming_call", "data": { ... } }     // Incoming call notification
{ "type": "call_ended", "data": { ... } }        // Call ended notification

🔌 Ports

Port Protocol Purpose
5070 (configurable) UDP SIP signaling
2000020200 (configurable) UDP RTP media (even ports, per-call allocation)
3060 (configurable) TCP Web dashboard + WebSocket + REST API

🛠️ Development

# Start in dev mode
pnpm start

# Build Rust proxy-engine
pnpm run buildRust

# Bundle web frontend
pnpm run bundle

# Build + bundle + restart background server
pnpm run restartBackground

This repository contains open-source code licensed under the MIT License. A copy of the license can be found in the license file.

Please note: The MIT License does not grant permission to use the trade names, trademarks, service marks, or product names of the project, except as required for reasonable and customary use in describing the origin of the work and reproducing the content of the NOTICE file.

Trademarks

This project is owned and maintained by Task Venture Capital GmbH. The names and logos associated with Task Venture Capital GmbH and any related products or services are trademarks of Task Venture Capital GmbH or third parties, and are not included within the scope of the MIT license granted herein.

Use of these trademarks must comply with Task Venture Capital GmbH's Trademark Guidelines or the guidelines of the respective third-party owners, and any usage must be approved in writing. Third-party trademarks used herein are the property of their respective owners and used only in a descriptive manner, e.g. for an implementation of an API or similar.

Company Information

Task Venture Capital GmbH
Registered at District Court Bremen HRB 35230 HB, Germany

For any legal inquiries or further information, please contact us via email at hello@task.vc.

By using this repository, you acknowledge that you have read this section, agree to comply with its terms, and understand that the licensing of the code does not imply endorsement by Task Venture Capital GmbH of any derivative works.

Description
No description provided
Readme 786 KiB
Languages
Rust 53.9%
TypeScript 46%
HTML 0.1%