Skip to content

Feature Roadmap

Implemented

SIP Signalling

SIP User Agent Client (UAC) over TLS/TCP (RFC 3261). Handles incoming INVITE, BYE, ACK, CANCEL, and OPTIONS requests, carrier REGISTER with digest authentication (RFC 8760: MD5, SHA-256, SHA-512/256), and double-CRLF keepalive ping/pong ([RFC 5626 ยง4.4.1]).

Media Transport (RTP/SRTP)

Full RTP packet parsing and per-call multiplexing (RFC 3550). SRTP encryption and authentication with AES_CM_128_HMAC_SHA1_80 (RFC 3711), with SDES key exchange carried inline in the SDP a=crypto: attribute (RFC 4568). First-byte STUN/RTP demultiplexing (RFC 7983).

NAT Traversal (STUN)

STUN Binding Request / Response with XOR-MAPPED-ADDRESS for RTP public address discovery (RFC 5389). Uses Cloudflare's STUN server by default; configurable or disabled per session.

Session Description (SDP)

Offer / answer model for audio calls. Codec negotiation for Opus (RFC 7587), G.722, PCMU, and PCMA (RFC 3551). Full SDP lexer with Pygments syntax highlighting.

Audio Codecs

Inbound decoding and outbound encoding via PyAV for all four negotiated codecs (Opus, G.722, PCMU, PCMA). Audio is resampled to 16 kHz float32 PCM for downstream processing.

Speech Transcription

Energy-based voice activity detection (VAD) with configurable silence gap. Utterances are transcribed in a thread pool via faster-whisper โ€” default model kyutai/stt-1b-en_fr-trfs.

AI Voice Agent

LLM response loop powered by Ollama, with streaming TTS via Pocket TTS and real-time RTP delivery. Chat history is maintained across turns. Inbound speech during a response cancels the current reply and hands control back to the caller.

CLI

voip sip <aor> transcribe โ€” live call transcription to stdout. voip sip <aor> agent โ€” AI voice agent. SIP message syntax highlighting via a Pygments lexer.