Skip to content

Feature Roadmap

Implemented

SIP Signalling

SIP User Agent Client (UAC) over TLS/TCP (RFC 3261). Handles incoming INVITE, BYE, ACK, CANCEL, and OPTIONS requests, carrier REGISTER with digest authentication (RFC 8760: MD5, SHA-256, SHA-512/256), and double-CRLF keepalive ping/pong ([RFC 5626 §4.4.1]). Client-initiated keepalive pings, Supported: outbound and ;ob Contact parameter ([RFC 5626 §5]), and automatic reconnection with exponential back-off ensure robust long-running sessions.

Media Transport (RTP/SRTP)

Full RTP packet parsing and per-call multiplexing (RFC 3550). SRTP encryption and authentication with AES_CM_128_HMAC_SHA1_80 (RFC 3711), with SDES key exchange carried inline in the SDP a=crypto: attribute (RFC 4568). First-byte STUN/RTP demultiplexing (RFC 7983).

NAT Traversal (STUN)

STUN Binding Request / Response with XOR-MAPPED-ADDRESS and MAPPED-ADDRESS for RTP public address discovery (RFC 5389). Full IPv4 and IPv6 address parsing for both attribute types. Uses Cloudflare's STUN server by default; configurable or disabled per session.

Session Description (SDP)

Offer / answer model for audio calls. Codec negotiation for Opus (RFC 7587), G.722, PCMU, and PCMA (RFC 3551). Full SDP lexer with Pygments syntax highlighting. IPv6 connection addresses advertised with IP6 address type per RFC 4566 §5.7.

IPv6

Full dual-stack support across SIP signalling, RTP media, and STUN discovery. IPv6 addresses in SIP URIs and Via/Contact headers are wrapped in square brackets per RFC 2732. The RTP UDP socket is bound to :: when the SIP signalling connection is over IPv6. STUN XOR-MAPPED-ADDRESS and MAPPED-ADDRESS attributes with IPv6 address family are correctly decoded per RFC 5389 §15.2.

Audio Codecs

Inbound decoding and outbound encoding via PyAV for all four negotiated codecs (Opus, G.722, PCMU, PCMA). Audio is resampled to 16 kHz float32 PCM for downstream processing.

Speech Transcription

Energy-based voice activity detection (VAD) with configurable silence gap. Utterances are transcribed in a thread pool via faster-whisper — default model kyutai/stt-1b-en_fr-trfs.

AI Voice Agent

LLM response loop powered by Ollama, with streaming TTS via Pocket TTS and real-time RTP delivery. Chat history is maintained across turns. Inbound speech during a response cancels the current reply and hands control back to the caller.

CLI

voip sip <aor> transcribe — live call transcription to stdout. voip sip <aor> agent — AI voice agent. SIP message syntax highlighting via a Pygments lexer.