TSBits["Config · WebRTC Signaling<br/>REST API · Web Dashboard<br/>Voicebox Manager · TTS Cache"]
end
subgraph Rust["Rust proxy-engine (data plane)"]
RustBits["SIP Stack · Dialog SM · Auth<br/>Call Manager · N-Leg Mixer<br/>48kHz f32 Bus · Jitter Buffer<br/>Codec Engine · RTP Port Pool<br/>WebRTC Engine · Kokoro TTS<br/>Voicemail · IVR · Recording"]
end
TS <-->|"JSON-over-stdio IPC"| Rust
end
Browser <-->|"Opus / WebRTC"| TS
Rust <-->|"SIP / RTP"| Devices
Rust <-->|"SIP / RTP"| Trunks
```
### 🧠 Key Design Decisions
@@ -71,6 +58,37 @@ siprouter sits between your SIP trunk providers and your endpoints — hardware
- **Per-Session Codec Isolation** — Each call leg gets its own encoder/decoder/resampler/denoiser state — no cross-call corruption.
- **SDP Codec Negotiation** — Outbound encoding uses the codec actually negotiated in SDP answers, not just the first offered codec.
### 📲 WebRTC Browser Call Flow
Browser calls are set up in a strict three-step dance — the WebRTC leg cannot be attached at call-creation time because the browser's session ID is only known once the SDP offer arrives:
```mermaid
sequenceDiagram
participant B as Browser
participant TS as TypeScript (sipproxy.ts)
participant R as Rust proxy-engine
participant P as SIP Provider
B->>TS: POST /api/call
TS->>R: make_call (pending call, no WebRTC leg yet)
R-->>TS: call_created
TS-->>B: webrtc-incoming (callId)
B->>TS: webrtc-offer (sessionId, SDP)
TS->>R: handle_webrtc_offer
R-->>TS: webrtc-answer (SDP)
TS-->>B: webrtc-answer
Note over R: Standalone WebRTC session<br/>(not yet attached to call)
B->>TS: webrtc_link (callId + sessionId)
TS->>R: link session → call
R->>R: wire WebRTC leg through mixer
R->>P: SIP INVITE
P-->>R: 200 OK + SDP
R-->>TS: call_answered
Note over B,P: Bidirectional Opus ↔ codec-transcoded<br/>audio flows through the mixer
```
---
## 🚀 Getting Started
@@ -246,9 +264,17 @@ The `proxy-engine` binary handles all real-time audio processing with a **48kHz
### Audio Pipeline
```
Inbound: Wire RTP → Jitter Buffer → Decode → Resample to 48kHz → Denoise (RNNoise) → Mix Bus
Outbound: Mix Bus → Mix-Minus → Resample to codec rate → Encode → Wire RTP
- **Adaptive jitter buffer** — per-leg `BTreeMap`-based buffer keyed by RTP sequence number. Delivers exactly one frame per 20ms mixer tick in sequence order. Adaptive target depth starts at 3 frames (60ms) and adjusts between 2–6 frames based on observed network jitter. Handles hold/resume by detecting large forward sequence jumps and resetting cleanly.
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.