Feature Roadmap
Implemented
SIP Signalling
SIP User Agent Client (UAC) over TLS/TCP (RFC 3261). Handles incoming
INVITE, BYE, ACK, CANCEL, and OPTIONS requests, carrier
REGISTER with digest authentication (RFC 8760: MD5, SHA-256,
SHA-512/256), and double-CRLF keepalive ping/pong ([RFC 5626 ยง4.4.1]).
Media Transport (RTP/SRTP)
Full RTP packet parsing and per-call multiplexing (RFC 3550). SRTP
encryption and authentication with AES_CM_128_HMAC_SHA1_80 (RFC 3711),
with SDES key exchange carried inline in the SDP a=crypto: attribute
(RFC 4568). First-byte STUN/RTP demultiplexing (RFC 7983).
NAT Traversal (STUN)
STUN Binding Request / Response with XOR-MAPPED-ADDRESS for RTP public
address discovery (RFC 5389). Uses Cloudflare's STUN server by default;
configurable or disabled per session.
Session Description (SDP)
Offer / answer model for audio calls. Codec negotiation for Opus (RFC 7587), G.722, PCMU, and PCMA (RFC 3551). Full SDP lexer with Pygments syntax highlighting.
Audio Codecs
Inbound decoding and outbound encoding via PyAV for all four negotiated codecs (Opus, G.722, PCMU, PCMA). Audio is resampled to 16 kHz float32 PCM for downstream processing.
Speech Transcription
Energy-based voice activity detection (VAD) with configurable silence gap.
Utterances are transcribed in a thread pool via faster-whisper โ
default model kyutai/stt-1b-en_fr-trfs.
AI Voice Agent
LLM response loop powered by Ollama, with streaming TTS via Pocket TTS and real-time RTP delivery. Chat history is maintained across turns. Inbound speech during a response cancels the current reply and hands control back to the caller.
CLI
voip sip <aor> transcribe โ live call transcription to stdout.
voip sip <aor> agent โ AI voice agent.
SIP message syntax highlighting via a Pygments lexer.