Releases: PatterAI/Patter
v0.6.5
0.6.5
Patter as a voice shell in front of an OpenAI-compatible agent runtime.
Added
OpenAICompatibleLLM/HermesLLM/OpenClawLLMpipeline-mode LLM providers (Python + TypeScript) — drive a phone call where each turn is answered by an external agent runtime atPOST {base_url}/chat/completions(Hermes, OpenClaw, Ollama, vLLM, LM Studio).- Opt-in session continuity: Hermes
X-Hermes-Session-Id(per call) + optionalX-Hermes-Session-Key(long-term memory scope); OpenClawuser+x-openclaw-session-key. - Opt-in
llm_error_message/llmErrorMessagespoken fallback when the LLM stream errors (gateway down / timeout) before any audio reached the caller. - Backward-compat
call_idintrospection guard so custom providers withoutcall_id/**kwargskeep working.
v0.6.4
Patter 0.6.4 — published to PyPI and npm.
Highlights
- Server-managed OpenAI Realtime turn-taking. For the
OpenAIRealtime(v1) andOpenAIRealtime2(GA) engines the OpenAI server now owns VAD, end-of-turn, response creation, and barge-in. Lower latency, prompt barge-in, and the model now replies on the server audio-commit instead of waiting for the Whisper transcript — the transcript becomes pure (live, ordered) observability and a hallucination can no longer suppress a real reply. Opt back into the legacy client-managed path withgateResponseOnTranscript/gate_response_on_transcript. (#154) - Fixed: Twilio + OpenAI Realtime garbled/static audio on all models — both Realtime engines route through the GA adapter with correct PCM24↔mulaw8 transcoding. (#154)
- Tune the Realtime VAD for noisy lines — new
RealtimeTurnDetection(threshold/semantic_vadeagerness) and inputnoise_reduction(far_field) stop speakerphone / room noise from cutting the agent off. - Built-in
consultescalation tool + native OpenClaw / OpenAI-compatible target — give an in-call agent an on-demand bridge to your back-office agent, with a post-call notify hook. - Long-running tools no longer drop the call — per-tool execution
timeout(up to 300 s),tool_call_preambles, andreassurancefiller keep the line alive during 30–60 s tool calls. - Dashboard security — auto-protected with a generated token when exposed beyond
127.0.0.1;allow_insecure_dashboardescape hatch. Plus Plivo carrier support in the dashboard UI.
Full details: CHANGELOG.md · PR #156
v0.6.3
Highlights
Added
- Completion-aware outbound calls —
call(wait=True)→CallResult(both SDKs). Place a call andawaitits real outcome in one line instead of hand-wiringon_call_end/onCallEndto an event.waitdefaults tofalse(fire-and-forget, unchanged), so it's fully backward compatible.CallResultcarriesoutcome(answered/voicemail/no_answer/busy/failed),status,duration,transcript,cost,metrics— every value from a real carrier signal. - Guaranteed teardown —
async with Patter(...)(Python) /await usingvia[Symbol.asyncDispose](TypeScript). Exiting always runsdisconnect(), closing the billing-leak where a stray TTS WebSocket kept billing after the SDK was done. - Plivo as a third telephony carrier (both SDKs) with full Twilio/Telnyx parity — V3 HMAC-SHA256 webhook signature verification (fails closed), bidirectional media WebSocket, native
sendDTMF, voicemail drop, CDR-reconciled pricing.call(wait=True)resolves for Plivo too. Thanks @amalshaji-plivo (#121). - Contributor guardrails —
AGENTS.md, PR template +CONTRIBUTING.mdpre-PR checklist (CHANGELOG, both-SDK parity,pr-validate.sh, notebook parity).
Fixed
- Plivo + Pipeline + ElevenLabs garbled/static outbound audio (TypeScript) — added the
plivo → ulaw_8000native-format case so already-μ-law TTS bytes aren't re-encoded. Python was unaffected. PatterTool(Python) always reportedcost_usd=None/duration_seconds=0.0— the result builder probed theCallMetricspayload as a dict; rebuilt on thecall(wait=True)path.
Install: pip install getpatter==0.6.3 · npm install getpatter@0.6.3
Full changelog: CHANGELOG.md → 0.6.3. PRs: #120, #121.
v0.6.2 — GA Realtime adapter + 14-bug fix wave + dashboard hardening
Bundled fix wave validated live via PSTN against the inbound and outbound OpenAI Realtime paths. New OpenAIRealtime2 engine + adapter (Python parity with TS) speaks the GA Realtime API (gpt-realtime-2), with bidirectional mulaw 8 kHz ↔ PCM 24 kHz transcoding for Twilio / Telnyx.
Highlights
- New
OpenAIRealtime2engine +OpenAIRealtime2Adapter(Python parity with TS) — speaks the GAsession.updateshape (session.type = "realtime", nestedaudio.{input,output},output_modalities), with bidirectional mulaw ↔ PCM 24 kHz transcoding. - Inbound caller/callee via TwiML
<Parameter>— Twilio strips URL query params before the WS handshake, so caller/callee now travel as<Parameter>children of<Stream>. Inbound calls in the dashboard now show the correct numbers. - Whisper hallucination filter on the Realtime
transcript_inputevent — drops "Thank you for watching.", "[music]", and 13 other YouTube-caption fallbacks that Whisper emits on silence/echo, eliminating phantom user turns. - Deferred
response.createwith newrequest_response()method —turn_detection.create_response: false+interrupt_response: falseso Patter drives the assistant turn ONLY after the hallucination filter accepts the user transcript. - VAD threshold raised 0.1 → 0.5 on the GA Realtime path — kills the phantom barge-in loop where carrier-loopback echo of the agent's own audio tripped server VAD.
Persistence + dashboard
persistdefault flipped OFF → ON in both SDKs. Calls now survive process restarts by default (path:~/Library/Application Support/patteron macOS / XDG dir on Linux /%LOCALAPPDATA%on Windows). Migration: passpersist=Falsefor the old ephemeral-RAM-only behaviour.PATTER_LOG_REDACT_PHONEdefault flippedmask→fullso the dashboard UI reveal toggle works (you can't unmask numbers the SDK never knew). Migration: setPATTER_LOG_REDACT_PHONE=maskfor setups that ship logs off-host.directionnow persisted inmetadata.json— fixes outbound calls rendering as inbound (and top-bar showing the callee instead of the Patter number) after restart.aggregates.sdk_versionfield — SPA top-bar version chip now auto-derives fromgetpatter.__version__/package.json#version.- Python
hydrate()now backfillstranscriptfrom siblingtranscript.jsonlwhenmetadata.jsonhas no array (TS already did). record_call_endpreserves the liveturnsarray + falls back to active/existing transcript when the SDK end-of-call snapshot is empty.- Standalone dashboard (
patter dashboard) now sees outbound dials in real time vianotify_dashboardrelay ofcall_initiated.
Prewarm + adoption hardening
- Liveness check rewritten to handle the current
websocketslibrary (stateenum +close_codechecks; legacyclosedfallback). Pre-fixgetattr(ws, "closed", True)defaulted to "dead" on the new client and silently aborted every adoption. - Application-level keepalive on the parked GA Realtime WS (
session.updateevery 3 s + WS PING every 4 s) — OpenAI's GA edge closes idle sockets within ~6-7 s. cancel_responseis now a no-op when no item is in flight — eliminatesresponse_cancel_not_activeERROR spam on every phantom VADspeech_started.firstMessageno longer truncated by loopback echo VAD — barge-in gate consults_current_response_first_audio_aton the adapter.
TS surface additions
- Top-level re-exports of
OpenAIRealtimeModel,OpenAITranscriptionModel,OpenAIRealtimeAudioFormat,OpenAIVoice,ElevenLabsModel,ElevenLabsOutputFormat,DeepgramModel,CartesiaTTSModel,RimeModel,PricingUnit,PRICING_VERSION, etc. —import { OpenAIRealtimeModel } from "getpatter"now works. engines/openai.tsRealtime.modeldefault flippedgpt-4o-mini-realtime-preview→gpt-realtime-minifor parity with Python.
Docs
31 Mintlify pages updated + 2 new pages (docs/python-sdk/providers/openai-realtime-2.mdx, docs/typescript-sdk/providers/openai-realtime-2.mdx).
Breaking changes
Both flips are opt-out with safe defaults; no API removals or renames.
persist=None→ defaults to ON (persist=Falseto opt out).PATTER_LOG_REDACT_PHONE→ defaults tofull(maskto opt back).
PR
#104 — release/0.6.2 → main (squash-merged)
0.6.1 — pipeline robustness + new providers + dashboard SPA
See PR #102 for the full changelog.
Highlights
- New providers:
OpenAIRealtime2(TS, GAgpt-realtime-2),InworldTTS,SpeechmaticsSTT(TS parity) - Pipeline: one-shot barge-in fix, first-message pacing, EOU/metrics alignment
- Dashboard: rewritten as Vite+React SPA, multi-select delete, dark-mode polish
- ElevenLabs TTS default flipped to WebSocket (TTFB ~265 ms → ~80-100 ms)
- Model-aware pricing across STT/TTS/Realtime
- Observability: OTel spans on Python, no-op stubs on TS
Install
- Python:
pip install getpatter==0.6.1 - TypeScript:
npm install getpatter@0.6.1
Known limitations
OpenAIRealtime2over Twilio: outbound audio works via Patter-side transcoding but GAserver_vadis studio-tuned, so pipeline mode (STT+LLM+TTS) is the recommended production path for Twilio in 0.6.1 until OpenAI ships native g711_ulaw on GA.- Python parity for
OpenAIRealtime2is a follow-up — TS-only in 0.6.1.
0.6.0 — Refactor wave + Phase 3+4 SDK fixes + Mintlify docs parity
Major SDK release validated by 9 rounds of agent-to-agent acceptance testing (Phase 3 R1–R4 + Phase 4 R1–R5). Six real SDK bugs found and fixed. PR: #83.
Highlights
Fixed
- OpenAI Realtime barge-in correctness:
cancel_responsenow capsaudio_end_msby wall-clock elapsed (was byte-counter), eliminating post-barge-in re-greeting and mid-sentence resume. Py + TS parity. - Pipeline mode
on_transcriptfires for assistant turns + tool calls: previously emitted only by Realtime mode. AddsLLMLoop.on_tool_callobserver +_emit_assistant_transcripthelper. Py + TS parity. - AssemblyAI STT (Python): coalesce 20 ms Twilio frames to 60 ms target (above v3 50 ms minimum). Closes parity with TS adapter; new
flush_audio()drains tail. getpatter.tts.elevenlabs.TTSfacade now forwardslanguage_code/voice_settings/chunk_size(the facade had a narrower signature than the underlying provider — multilingual scenarios crashed). Py + TS parity.- Cerebras + Groq LLM pricing — silent under-billing fix:
gpt-oss-120b(Cerebras default since 0.5.4) and 5 Groq models all billed $0. Now per-1M-token rates for every enum value. Py + TS parity. - Pricing tables now model-aware across STT, TTS, and Realtime: was provider-only, so Deepgram nova-3 multilingual users were billed at nova-3 monolingual rate, gpt-realtime-2 users at gpt-realtime-mini rate (4× under-charge on audio out). New
_resolve_provider_rateshelper with longest-prefix fallback. Built-in rates for Deepgram, Whisper/Transcribe, ElevenLabs, OpenAI TTS, Cartesia, Rime, LMNT, Inworld, OpenAI Realtime. PRICING_VERSION 2026.2 → 2026.3. Py + TS parity. - OpenAI Realtime engine wrapper now forwards
reasoning_effortandinput_audio_transcription_model(were silently dropped by the high-level wrapper). Py + TS parity.
Changed
CircuitBreakerOptions.cooldown_s→cooldown_ms(Python aligned to TScooldownMs). Backward-compat shim emitsDeprecationWarning. Scheduled removal in v0.7.0.
Added
- TypeScript
manageWebhookopt-out forserve()— closes a hidden footgun for users running behind a router/gateway (Terraform / edge function) whose Twiliovoice_urlis managed externally. Defaulttruepreserves existing behaviour. - TypeScript SDK now ships
SpeechmaticsSTT(closes long-standing Python-only gap). RT v2 WebSocket protocol direct viaws. 21-test mocked suite. - OpenAI Realtime
gpt-realtime-2andgpt-realtime-whispermodel IDs with model-aware billing. - Python parity for
ConversationStateSnapshot,UserState,AgentState,EouTriggertypes (catches up to TypeScript). - MCP server integration — both SDKs expose
mcp_serversconfig + dedicated docs page. - Inworld TTS provider (TTS-2 default, NDJSON streaming).
Repo restructure
sdk-py/→libraries/python/sdk-ts/→libraries/typescript/- 33 new Mintlify provider reference pages (full Py↔TS parity across 22 providers)
Validation
- 9-round agent-to-agent acceptance matrix (Phase 3 + Phase 4)
- Python: 1707 tests pass, 7 skipped
- TypeScript: 1381 tests pass (78 files)
- All 13 CI blocking checks green on PR #83
Install
```sh
pip install --upgrade getpatter==0.6.0
npm install getpatter@0.6.0
```
Full changelog: CHANGELOG.md
0.5.4 — Cerebras default to gpt-oss-120b
Hotfix: restores gpt-oss-120b as the default Cerebras model.
Install
```bash
pip install getpatter==0.5.4
npm install getpatter@0.5.4
```
What changed
The 0.5.3 merge inadvertently kept llama3.1-8b as the Cerebras default — a regression of an earlier project decision. Bumping back to gpt-oss-120b for both Python and TypeScript SDKs.
Why gpt-oss-120b is the right default
- Throughput on Cerebras WSE-3: ~3000 tok/sec, the highest in the catalog.
- TTS-bottlenecked: voice agents consume LLM output at ~150-300 tok/sec via TTS. Both 8B and 120B models saturate the downstream pipeline, so model size doesn't add realtime latency.
- No deprecation: while
llama3.1-8bretires 2026-05-27 and the preview models (qwen-3-235b-a22b-instruct-2507,zai-glm-4.7) carry no SLA. - Quality: 120B parameters give materially better answer quality at no realtime cost.
Override
Other models remain reachable via model=:
```python
from getpatter import CerebrasLLM
agent = CerebrasLLM(model="llama3.1-8b") # smaller, free-tier
agent = CerebrasLLM(model="qwen-3-235b-a22b-instruct-2507") # preview
```
If your tier returns 404 for gpt-oss-120b, the provider's stream() logs a recovery hint listing override candidates.
Compatibility
- Drop-in replacement for 0.5.3.
- Python ≥ 3.11, Node ≥ 18.
Full changelog
See the v0.5.3 release notes for the full 0.5.x feature set; 0.5.4 is a single-line default-model change on top.
0.5.3 — latency pass + observability + parity
First polishing release of the public SDK. Cost-accuracy, audio-pipeline, and observability hardening across both SDKs, plus opt-in per-call filesystem logging, telephony optimizations, and provider tunings.
Install
```bash
pip install getpatter==0.5.3
npm install getpatter@0.5.3
```
Highlights
Latency
End-to-end P50 (user-stop → first TTS audio byte) reduced by ~1000-2000 ms:
- STT: Python
speech_finalparity with TypeScript (Deepgram fast endpointing, ~300-700 ms saved per turn). Defaultsmart_format=Falsefor telephony. Whisper / OpenAITranscribeSTT always flush on close. - LLM: Anthropic prompt caching enabled by default (
cache_control: ephemeralon system + last tool block). Cerebras hardening: retry + structured outputs + sampling kwargs forwarding. Newbefore_llm/after_llmpipeline hooks for PII redaction, output validation, prompt rewriting. - TTS: Cartesia bumped to
sonic-3(~90 ms TTFB). OpenAI TTS chunk size 4096 → 1024. Sentence chunker emits short greetings immediately. New telephony factories (for_twilio()/for_telnyx()) on ElevenLabs, Cartesia, and ConvAI that negotiate carrier-native codecs (ulaw_8000/ 8 kHz PCM) and skip per-chunk SDK transcoding. - Realtime: OpenAI Realtime
silence_duration_ms500 → 300. - Telephony: Telnyx
answer+streaming_startconsolidated into a single API call (saves one webhook round-trip). TS Twilio outbound switched fromUrl:to inlineTwiml:(parity with Python adapter, saves another round-trip).stream_trackset toinbound_track(halves WS upstream bandwidth). Defaultring_timeoutlowered from 60 s to 25 s. - Infrastructure:
notify_dashboardmade async + fire-and-forget (avoids 1-3 s stall when dashboard is offline). TScall-logswitched tofs.promisesto keep ~75 ms of cumulative blocking off the Node main thread.
Providers
- New first-class
OpenAITranscribeSTTforgpt-4o-transcribe/gpt-4o-mini-transcribe. - Typed ElevenLabs model literal —
eleven_v3/eleven_flash_v2_5/eleven_turbo_v2_5/eleven_multilingual_v2/eleven_monolingual_v1. - Cerebras:
response_format(JSON mode + structured outputs),parallel_tool_calls,tool_choice,seed,top_p,frequency_penalty,presence_penalty,stop,User-Agenttelemetry,max_completion_tokens, gzip compression on by default in TypeScript (parity with Python). - Same OpenAI-spec sampling kwargs lifted to the
OpenAILLMProviderparent so every OpenAI-compat client benefits.
Observability
LatencyBreakdownextended withendpoint_ms,bargein_ms,tts_total_ms, properly splitllm_ttft_ms/llm_total_ms.- New aggregate:
latency_p90alongside P50 / P95 / P99. - New OTel spans
getpatter.endpointandgetpatter.bargein. The pre-existinggetpatter.llmspan is now actually emitted around the pipeline LLM call. - New
EventBusevent types:transcript_partial,transcript_final,llm_chunk,tts_chunk,tool_call_started. - TS span names normalised to
getpatter.*everywhere.
Compatibility
- Python ≥ 3.11.
- Node ≥ 18.
- No public-API breaks. Anything that was deprecated in 0.5.x continues to work; the new factories (
for_twilio()/for_telnyx()) are opt-in.
Full changelog
See CHANGELOG.md and PR #76.
v0.5.2
Fixed
- ElevenLabs default voice: Rachel → Sarah — Rachel (
21m00Tcm4TlvDq8ikWAM) is a library voice that free-tier ElevenLabs accounts cannot use via the API, sonew ElevenLabsTTS()/ElevenLabsTTS()without an explicitvoice_idused to fail on the first synthesis with402 paid_plan_required. The default is now Sarah (EXAVITQu4vr4xnSDxMaL), a premade voice available to every account. alloyalias now resolves to Sarah for the same reason.rachelalias still resolves to her original ID — passvoice="rachel"explicitly to keep using her (requires a paid ElevenLabs plan).- Added
sarahalias alongside the existingbella(same voice ID).
Changed (UX)
- Startup banner now renders at the top of the terminal output (before tunnel / webhook setup logs), with a visually distinct Dashboard section using box-drawing separators.
- Reduced per-frame log noise during calls: removed
WS event:,Telnyx event:,Upgrade request:,WebSocket connected:lines. OnlyCall started/Call endedremain on the happy path.
Install
pip install --upgrade getpatter==0.5.2
npm install getpatter@0.5.2Full changelog: v0.5.1...v0.5.2
v0.5.1 — First-class llm= selector + 5 LLM providers
Patter 0.5.1 adds llm= as a first-class selector on phone.agent(), mirroring the stt= / tts= / engine= pattern shipped in 0.5.0. Five LLM provider classes with env-var fallback ship in both SDKs:
from getpatter import Patter, Twilio, DeepgramSTT, AnthropicLLM, ElevenLabsTTS
phone = Patter(carrier=Twilio(), phone_number="+15550001234")
agent = phone.agent(
stt=DeepgramSTT(), # DEEPGRAM_API_KEY
llm=AnthropicLLM(), # ANTHROPIC_API_KEY
tts=ElevenLabsTTS(voice_id="rachel"), # ELEVENLABS_API_KEY
system_prompt="You are helpful.",
)
await phone.serve(agent)TypeScript mirror uses new AnthropicLLM().
What's new
OpenAILLM,AnthropicLLM,GroqLLM,CerebrasLLM,GoogleLLM— namespaced classes (getpatter.llm.{vendor}.LLMin Python,getpatter/llm/{vendor}in TS) plus flat aliases from the package root.- Env-var fallback per provider —
OPENAI_API_KEY,ANTHROPIC_API_KEY,GROQ_API_KEY,CEREBRAS_API_KEY,GEMINI_API_KEY(falls back toGOOGLE_API_KEY). - Tool calling works across all 5 providers — each adapter normalizes vendor-specific tool formats to Patter's unified chunk protocol.
llm=andon_messageare mutually exclusive — conflict raises a clear error atserve()time;engine + llmlogs a one-time warning (engine owns the LLM).- Clean logging — two INFO lines per call (start + end):
Call started: CAxxx (Twilio, engine=openai_realtime, +15550001 → +15550002)Call ended: CAxxx (42.3s, 8 turns, cost=\$0.0127, p95=612ms)- Everything else (STT transcripts, barge-in, hallucination filter, DTMF, per-turn guardrail triggers) demoted to
debug.
Fixes
- TypeScript bundler (critical) —
tsupwas bundlingcloudflaredinto the ESM dist, and sincecloudflaredis CJS and callsrequire("path")at runtime,serve({ tunnel: true })crashed withDynamic require of "path" is not supported. Fixed by externalizingcloudflaredand@ngrok/ngrokin a newtsup.config.tsso they resolve from the consumer'snode_modules/at runtime. - CI —
anthropic/google-genaioptional-extras tests now skip gracefully on the base Python matrix (they still run end-to-end in thepython-all-extrasjob). - Parity —
provider="pipeline"is now derived in the TypeScript client when the user passes onlyllm=without an engine, matching the Python client.
Install
pip install getpatter # 0.5.1
pip install "getpatter[anthropic]" # + Anthropic adapter
pip install "getpatter[google]" # + Google Gemini
pip install "getpatter[tunnel]" # + cloudflared for dev tunnels
npm install getpatter # 0.5.1
npm install cloudflared # optional, only if using tunnel: trueValidation
- Python: 1350 pytest passed, 8 skipped, 0 failures (1327 baseline + 23 new in
tests/unit/test_llm_api.py) - TypeScript: 1042 vitest passed across 61 files,
tsc --noEmitclean (1013 baseline + 29 new intests/unit/llm-api.test.ts) - 4-line pipeline-mode quickstart with
llm=AnthropicLLM()verified in both SDKs
Unchanged
- Default OpenAI LLMLoop still auto-constructs when
llm=is absent andopenai_keyis present — zero break from 0.5.0. on_messagecallback still works for custom LLM logic (multi-model routing, localllama.cpp, etc.). Mutually exclusive withllm=.
Links
- PR: #69
- Docs: https://docs.getpatter.com