Skip to content

Releases: PatterAI/Patter

v0.6.5

05 Jun 13:28
fae9dc1

Choose a tag to compare

0.6.5

Patter as a voice shell in front of an OpenAI-compatible agent runtime.

Added

  • OpenAICompatibleLLM / HermesLLM / OpenClawLLM pipeline-mode LLM providers (Python + TypeScript) — drive a phone call where each turn is answered by an external agent runtime at POST {base_url}/chat/completions (Hermes, OpenClaw, Ollama, vLLM, LM Studio).
  • Opt-in session continuity: Hermes X-Hermes-Session-Id (per call) + optional X-Hermes-Session-Key (long-term memory scope); OpenClaw user + x-openclaw-session-key.
  • Opt-in llm_error_message / llmErrorMessage spoken fallback when the LLM stream errors (gateway down / timeout) before any audio reached the caller.
  • Backward-compat call_id introspection guard so custom providers without call_id/**kwargs keep working.

PRs: #157 (feature), #158 (release bump).

v0.6.4

05 Jun 00:39
7c8077a

Choose a tag to compare

Patter 0.6.4 — published to PyPI and npm.

Highlights

  • Server-managed OpenAI Realtime turn-taking. For the OpenAIRealtime (v1) and OpenAIRealtime2 (GA) engines the OpenAI server now owns VAD, end-of-turn, response creation, and barge-in. Lower latency, prompt barge-in, and the model now replies on the server audio-commit instead of waiting for the Whisper transcript — the transcript becomes pure (live, ordered) observability and a hallucination can no longer suppress a real reply. Opt back into the legacy client-managed path with gateResponseOnTranscript / gate_response_on_transcript. (#154)
  • Fixed: Twilio + OpenAI Realtime garbled/static audio on all models — both Realtime engines route through the GA adapter with correct PCM24↔mulaw8 transcoding. (#154)
  • Tune the Realtime VAD for noisy lines — new RealtimeTurnDetection (threshold / semantic_vad eagerness) and input noise_reduction (far_field) stop speakerphone / room noise from cutting the agent off.
  • Built-in consult escalation tool + native OpenClaw / OpenAI-compatible target — give an in-call agent an on-demand bridge to your back-office agent, with a post-call notify hook.
  • Long-running tools no longer drop the call — per-tool execution timeout (up to 300 s), tool_call_preambles, and reassurance filler keep the line alive during 30–60 s tool calls.
  • Dashboard security — auto-protected with a generated token when exposed beyond 127.0.0.1; allow_insecure_dashboard escape hatch. Plus Plivo carrier support in the dashboard UI.

Full details: CHANGELOG.md · PR #156

v0.6.3

29 May 11:09
9d775de

Choose a tag to compare

Highlights

Added

  • Completion-aware outbound calls — call(wait=True)CallResult (both SDKs). Place a call and await its real outcome in one line instead of hand-wiring on_call_end/onCallEnd to an event. wait defaults to false (fire-and-forget, unchanged), so it's fully backward compatible. CallResult carries outcome (answered / voicemail / no_answer / busy / failed), status, duration, transcript, cost, metrics — every value from a real carrier signal.
  • Guaranteed teardownasync with Patter(...) (Python) / await using via [Symbol.asyncDispose] (TypeScript). Exiting always runs disconnect(), closing the billing-leak where a stray TTS WebSocket kept billing after the SDK was done.
  • Plivo as a third telephony carrier (both SDKs) with full Twilio/Telnyx parity — V3 HMAC-SHA256 webhook signature verification (fails closed), bidirectional media WebSocket, native sendDTMF, voicemail drop, CDR-reconciled pricing. call(wait=True) resolves for Plivo too. Thanks @amalshaji-plivo (#121).
  • Contributor guardrailsAGENTS.md, PR template + CONTRIBUTING.md pre-PR checklist (CHANGELOG, both-SDK parity, pr-validate.sh, notebook parity).

Fixed

  • Plivo + Pipeline + ElevenLabs garbled/static outbound audio (TypeScript) — added the plivo → ulaw_8000 native-format case so already-μ-law TTS bytes aren't re-encoded. Python was unaffected.
  • PatterTool (Python) always reported cost_usd=None / duration_seconds=0.0 — the result builder probed the CallMetrics payload as a dict; rebuilt on the call(wait=True) path.

Install: pip install getpatter==0.6.3 · npm install getpatter@0.6.3

Full changelog: CHANGELOG.md → 0.6.3. PRs: #120, #121.

v0.6.2 — GA Realtime adapter + 14-bug fix wave + dashboard hardening

25 May 10:22
5db6779

Choose a tag to compare

Bundled fix wave validated live via PSTN against the inbound and outbound OpenAI Realtime paths. New OpenAIRealtime2 engine + adapter (Python parity with TS) speaks the GA Realtime API (gpt-realtime-2), with bidirectional mulaw 8 kHz ↔ PCM 24 kHz transcoding for Twilio / Telnyx.

Highlights

  • New OpenAIRealtime2 engine + OpenAIRealtime2Adapter (Python parity with TS) — speaks the GA session.update shape (session.type = "realtime", nested audio.{input,output}, output_modalities), with bidirectional mulaw ↔ PCM 24 kHz transcoding.
  • Inbound caller/callee via TwiML <Parameter> — Twilio strips URL query params before the WS handshake, so caller/callee now travel as <Parameter> children of <Stream>. Inbound calls in the dashboard now show the correct numbers.
  • Whisper hallucination filter on the Realtime transcript_input event — drops "Thank you for watching.", "[music]", and 13 other YouTube-caption fallbacks that Whisper emits on silence/echo, eliminating phantom user turns.
  • Deferred response.create with new request_response() method — turn_detection.create_response: false + interrupt_response: false so Patter drives the assistant turn ONLY after the hallucination filter accepts the user transcript.
  • VAD threshold raised 0.1 → 0.5 on the GA Realtime path — kills the phantom barge-in loop where carrier-loopback echo of the agent's own audio tripped server VAD.

Persistence + dashboard

  • persist default flipped OFF → ON in both SDKs. Calls now survive process restarts by default (path: ~/Library/Application Support/patter on macOS / XDG dir on Linux / %LOCALAPPDATA% on Windows). Migration: pass persist=False for the old ephemeral-RAM-only behaviour.
  • PATTER_LOG_REDACT_PHONE default flipped maskfull so the dashboard UI reveal toggle works (you can't unmask numbers the SDK never knew). Migration: set PATTER_LOG_REDACT_PHONE=mask for setups that ship logs off-host.
  • direction now persisted in metadata.json — fixes outbound calls rendering as inbound (and top-bar showing the callee instead of the Patter number) after restart.
  • aggregates.sdk_version field — SPA top-bar version chip now auto-derives from getpatter.__version__ / package.json#version.
  • Python hydrate() now backfills transcript from sibling transcript.jsonl when metadata.json has no array (TS already did).
  • record_call_end preserves the live turns array + falls back to active/existing transcript when the SDK end-of-call snapshot is empty.
  • Standalone dashboard (patter dashboard) now sees outbound dials in real time via notify_dashboard relay of call_initiated.

Prewarm + adoption hardening

  • Liveness check rewritten to handle the current websockets library (state enum + close_code checks; legacy closed fallback). Pre-fix getattr(ws, "closed", True) defaulted to "dead" on the new client and silently aborted every adoption.
  • Application-level keepalive on the parked GA Realtime WS (session.update every 3 s + WS PING every 4 s) — OpenAI's GA edge closes idle sockets within ~6-7 s.
  • cancel_response is now a no-op when no item is in flight — eliminates response_cancel_not_active ERROR spam on every phantom VAD speech_started.
  • firstMessage no longer truncated by loopback echo VAD — barge-in gate consults _current_response_first_audio_at on the adapter.

TS surface additions

  • Top-level re-exports of OpenAIRealtimeModel, OpenAITranscriptionModel, OpenAIRealtimeAudioFormat, OpenAIVoice, ElevenLabsModel, ElevenLabsOutputFormat, DeepgramModel, CartesiaTTSModel, RimeModel, PricingUnit, PRICING_VERSION, etc. — import { OpenAIRealtimeModel } from "getpatter" now works.
  • engines/openai.ts Realtime.model default flipped gpt-4o-mini-realtime-previewgpt-realtime-mini for parity with Python.

Docs

31 Mintlify pages updated + 2 new pages (docs/python-sdk/providers/openai-realtime-2.mdx, docs/typescript-sdk/providers/openai-realtime-2.mdx).

Breaking changes

Both flips are opt-out with safe defaults; no API removals or renames.

  • persist=None → defaults to ON (persist=False to opt out).
  • PATTER_LOG_REDACT_PHONE → defaults to full (mask to opt back).

PR

#104 — release/0.6.2 → main (squash-merged)

0.6.1 — pipeline robustness + new providers + dashboard SPA

17 May 13:27
02d4d04

Choose a tag to compare

See PR #102 for the full changelog.

Highlights

  • New providers: OpenAIRealtime2 (TS, GA gpt-realtime-2), InworldTTS, SpeechmaticsSTT (TS parity)
  • Pipeline: one-shot barge-in fix, first-message pacing, EOU/metrics alignment
  • Dashboard: rewritten as Vite+React SPA, multi-select delete, dark-mode polish
  • ElevenLabs TTS default flipped to WebSocket (TTFB ~265 ms → ~80-100 ms)
  • Model-aware pricing across STT/TTS/Realtime
  • Observability: OTel spans on Python, no-op stubs on TS

Install

  • Python: pip install getpatter==0.6.1
  • TypeScript: npm install getpatter@0.6.1

Known limitations

  • OpenAIRealtime2 over Twilio: outbound audio works via Patter-side transcoding but GA server_vad is studio-tuned, so pipeline mode (STT+LLM+TTS) is the recommended production path for Twilio in 0.6.1 until OpenAI ships native g711_ulaw on GA.
  • Python parity for OpenAIRealtime2 is a follow-up — TS-only in 0.6.1.

0.6.0 — Refactor wave + Phase 3+4 SDK fixes + Mintlify docs parity

08 May 21:46
2d47d7d

Choose a tag to compare

Major SDK release validated by 9 rounds of agent-to-agent acceptance testing (Phase 3 R1–R4 + Phase 4 R1–R5). Six real SDK bugs found and fixed. PR: #83.

Highlights

Fixed

  • OpenAI Realtime barge-in correctness: cancel_response now caps audio_end_ms by wall-clock elapsed (was byte-counter), eliminating post-barge-in re-greeting and mid-sentence resume. Py + TS parity.
  • Pipeline mode on_transcript fires for assistant turns + tool calls: previously emitted only by Realtime mode. Adds LLMLoop.on_tool_call observer + _emit_assistant_transcript helper. Py + TS parity.
  • AssemblyAI STT (Python): coalesce 20 ms Twilio frames to 60 ms target (above v3 50 ms minimum). Closes parity with TS adapter; new flush_audio() drains tail.
  • getpatter.tts.elevenlabs.TTS facade now forwards language_code / voice_settings / chunk_size (the facade had a narrower signature than the underlying provider — multilingual scenarios crashed). Py + TS parity.
  • Cerebras + Groq LLM pricing — silent under-billing fix: gpt-oss-120b (Cerebras default since 0.5.4) and 5 Groq models all billed $0. Now per-1M-token rates for every enum value. Py + TS parity.
  • Pricing tables now model-aware across STT, TTS, and Realtime: was provider-only, so Deepgram nova-3 multilingual users were billed at nova-3 monolingual rate, gpt-realtime-2 users at gpt-realtime-mini rate (4× under-charge on audio out). New _resolve_provider_rates helper with longest-prefix fallback. Built-in rates for Deepgram, Whisper/Transcribe, ElevenLabs, OpenAI TTS, Cartesia, Rime, LMNT, Inworld, OpenAI Realtime. PRICING_VERSION 2026.2 → 2026.3. Py + TS parity.
  • OpenAI Realtime engine wrapper now forwards reasoning_effort and input_audio_transcription_model (were silently dropped by the high-level wrapper). Py + TS parity.

Changed

  • CircuitBreakerOptions.cooldown_scooldown_ms (Python aligned to TS cooldownMs). Backward-compat shim emits DeprecationWarning. Scheduled removal in v0.7.0.

Added

  • TypeScript manageWebhook opt-out for serve() — closes a hidden footgun for users running behind a router/gateway (Terraform / edge function) whose Twilio voice_url is managed externally. Default true preserves existing behaviour.
  • TypeScript SDK now ships SpeechmaticsSTT (closes long-standing Python-only gap). RT v2 WebSocket protocol direct via ws. 21-test mocked suite.
  • OpenAI Realtime gpt-realtime-2 and gpt-realtime-whisper model IDs with model-aware billing.
  • Python parity for ConversationStateSnapshot, UserState, AgentState, EouTrigger types (catches up to TypeScript).
  • MCP server integration — both SDKs expose mcp_servers config + dedicated docs page.
  • Inworld TTS provider (TTS-2 default, NDJSON streaming).

Repo restructure

  • sdk-py/libraries/python/
  • sdk-ts/libraries/typescript/
  • 33 new Mintlify provider reference pages (full Py↔TS parity across 22 providers)

Validation

  • 9-round agent-to-agent acceptance matrix (Phase 3 + Phase 4)
  • Python: 1707 tests pass, 7 skipped
  • TypeScript: 1381 tests pass (78 files)
  • All 13 CI blocking checks green on PR #83

Install

```sh
pip install --upgrade getpatter==0.6.0
npm install getpatter@0.6.0
```

Full changelog: CHANGELOG.md

0.5.4 — Cerebras default to gpt-oss-120b

27 Apr 18:58

Choose a tag to compare

Hotfix: restores gpt-oss-120b as the default Cerebras model.

Install

```bash
pip install getpatter==0.5.4
npm install getpatter@0.5.4
```

What changed

The 0.5.3 merge inadvertently kept llama3.1-8b as the Cerebras default — a regression of an earlier project decision. Bumping back to gpt-oss-120b for both Python and TypeScript SDKs.

Why gpt-oss-120b is the right default

  • Throughput on Cerebras WSE-3: ~3000 tok/sec, the highest in the catalog.
  • TTS-bottlenecked: voice agents consume LLM output at ~150-300 tok/sec via TTS. Both 8B and 120B models saturate the downstream pipeline, so model size doesn't add realtime latency.
  • No deprecation: while llama3.1-8b retires 2026-05-27 and the preview models (qwen-3-235b-a22b-instruct-2507, zai-glm-4.7) carry no SLA.
  • Quality: 120B parameters give materially better answer quality at no realtime cost.

Override

Other models remain reachable via model=:

```python
from getpatter import CerebrasLLM

agent = CerebrasLLM(model="llama3.1-8b") # smaller, free-tier
agent = CerebrasLLM(model="qwen-3-235b-a22b-instruct-2507") # preview
```

If your tier returns 404 for gpt-oss-120b, the provider's stream() logs a recovery hint listing override candidates.

Compatibility

  • Drop-in replacement for 0.5.3.
  • Python ≥ 3.11, Node ≥ 18.

Full changelog

See the v0.5.3 release notes for the full 0.5.x feature set; 0.5.4 is a single-line default-model change on top.

0.5.3 — latency pass + observability + parity

27 Apr 18:46

Choose a tag to compare

First polishing release of the public SDK. Cost-accuracy, audio-pipeline, and observability hardening across both SDKs, plus opt-in per-call filesystem logging, telephony optimizations, and provider tunings.

Install

```bash
pip install getpatter==0.5.3
npm install getpatter@0.5.3
```

Highlights

Latency

End-to-end P50 (user-stop → first TTS audio byte) reduced by ~1000-2000 ms:

  • STT: Python speech_final parity with TypeScript (Deepgram fast endpointing, ~300-700 ms saved per turn). Default smart_format=False for telephony. Whisper / OpenAITranscribeSTT always flush on close.
  • LLM: Anthropic prompt caching enabled by default (cache_control: ephemeral on system + last tool block). Cerebras hardening: retry + structured outputs + sampling kwargs forwarding. New before_llm / after_llm pipeline hooks for PII redaction, output validation, prompt rewriting.
  • TTS: Cartesia bumped to sonic-3 (~90 ms TTFB). OpenAI TTS chunk size 4096 → 1024. Sentence chunker emits short greetings immediately. New telephony factories (for_twilio() / for_telnyx()) on ElevenLabs, Cartesia, and ConvAI that negotiate carrier-native codecs (ulaw_8000 / 8 kHz PCM) and skip per-chunk SDK transcoding.
  • Realtime: OpenAI Realtime silence_duration_ms 500 → 300.
  • Telephony: Telnyx answer + streaming_start consolidated into a single API call (saves one webhook round-trip). TS Twilio outbound switched from Url: to inline Twiml: (parity with Python adapter, saves another round-trip). stream_track set to inbound_track (halves WS upstream bandwidth). Default ring_timeout lowered from 60 s to 25 s.
  • Infrastructure: notify_dashboard made async + fire-and-forget (avoids 1-3 s stall when dashboard is offline). TS call-log switched to fs.promises to keep ~75 ms of cumulative blocking off the Node main thread.

Providers

  • New first-class OpenAITranscribeSTT for gpt-4o-transcribe / gpt-4o-mini-transcribe.
  • Typed ElevenLabs model literal — eleven_v3 / eleven_flash_v2_5 / eleven_turbo_v2_5 / eleven_multilingual_v2 / eleven_monolingual_v1.
  • Cerebras: response_format (JSON mode + structured outputs), parallel_tool_calls, tool_choice, seed, top_p, frequency_penalty, presence_penalty, stop, User-Agent telemetry, max_completion_tokens, gzip compression on by default in TypeScript (parity with Python).
  • Same OpenAI-spec sampling kwargs lifted to the OpenAILLMProvider parent so every OpenAI-compat client benefits.

Observability

  • LatencyBreakdown extended with endpoint_ms, bargein_ms, tts_total_ms, properly split llm_ttft_ms / llm_total_ms.
  • New aggregate: latency_p90 alongside P50 / P95 / P99.
  • New OTel spans getpatter.endpoint and getpatter.bargein. The pre-existing getpatter.llm span is now actually emitted around the pipeline LLM call.
  • New EventBus event types: transcript_partial, transcript_final, llm_chunk, tts_chunk, tool_call_started.
  • TS span names normalised to getpatter.* everywhere.

Compatibility

  • Python ≥ 3.11.
  • Node ≥ 18.
  • No public-API breaks. Anything that was deprecated in 0.5.x continues to work; the new factories (for_twilio() / for_telnyx()) are opt-in.

Full changelog

See CHANGELOG.md and PR #76.

v0.5.2

23 Apr 13:13
81358ae

Choose a tag to compare

Fixed

  • ElevenLabs default voice: Rachel → Sarah — Rachel (21m00Tcm4TlvDq8ikWAM) is a library voice that free-tier ElevenLabs accounts cannot use via the API, so new ElevenLabsTTS() / ElevenLabsTTS() without an explicit voice_id used to fail on the first synthesis with 402 paid_plan_required. The default is now Sarah (EXAVITQu4vr4xnSDxMaL), a premade voice available to every account.
  • alloy alias now resolves to Sarah for the same reason.
  • rachel alias still resolves to her original ID — pass voice="rachel" explicitly to keep using her (requires a paid ElevenLabs plan).
  • Added sarah alias alongside the existing bella (same voice ID).

Changed (UX)

  • Startup banner now renders at the top of the terminal output (before tunnel / webhook setup logs), with a visually distinct Dashboard section using box-drawing separators.
  • Reduced per-frame log noise during calls: removed WS event:, Telnyx event:, Upgrade request:, WebSocket connected: lines. Only Call started / Call ended remain on the happy path.

Install

pip install --upgrade getpatter==0.5.2
npm install getpatter@0.5.2

Full changelog: v0.5.1...v0.5.2

v0.5.1 — First-class llm= selector + 5 LLM providers

22 Apr 19:59
85e6294

Choose a tag to compare

Patter 0.5.1 adds llm= as a first-class selector on phone.agent(), mirroring the stt= / tts= / engine= pattern shipped in 0.5.0. Five LLM provider classes with env-var fallback ship in both SDKs:

from getpatter import Patter, Twilio, DeepgramSTT, AnthropicLLM, ElevenLabsTTS

phone = Patter(carrier=Twilio(), phone_number="+15550001234")
agent = phone.agent(
    stt=DeepgramSTT(),                    # DEEPGRAM_API_KEY
    llm=AnthropicLLM(),                   # ANTHROPIC_API_KEY
    tts=ElevenLabsTTS(voice_id="rachel"), # ELEVENLABS_API_KEY
    system_prompt="You are helpful.",
)
await phone.serve(agent)

TypeScript mirror uses new AnthropicLLM().

What's new

  • OpenAILLM, AnthropicLLM, GroqLLM, CerebrasLLM, GoogleLLM — namespaced classes (getpatter.llm.{vendor}.LLM in Python, getpatter/llm/{vendor} in TS) plus flat aliases from the package root.
  • Env-var fallback per provider — OPENAI_API_KEY, ANTHROPIC_API_KEY, GROQ_API_KEY, CEREBRAS_API_KEY, GEMINI_API_KEY (falls back to GOOGLE_API_KEY).
  • Tool calling works across all 5 providers — each adapter normalizes vendor-specific tool formats to Patter's unified chunk protocol.
  • llm= and on_message are mutually exclusive — conflict raises a clear error at serve() time; engine + llm logs a one-time warning (engine owns the LLM).
  • Clean logging — two INFO lines per call (start + end):
    • Call started: CAxxx (Twilio, engine=openai_realtime, +15550001 → +15550002)
    • Call ended: CAxxx (42.3s, 8 turns, cost=\$0.0127, p95=612ms)
    • Everything else (STT transcripts, barge-in, hallucination filter, DTMF, per-turn guardrail triggers) demoted to debug.

Fixes

  • TypeScript bundler (critical)tsup was bundling cloudflared into the ESM dist, and since cloudflared is CJS and calls require("path") at runtime, serve({ tunnel: true }) crashed with Dynamic require of "path" is not supported. Fixed by externalizing cloudflared and @ngrok/ngrok in a new tsup.config.ts so they resolve from the consumer's node_modules/ at runtime.
  • CIanthropic / google-genai optional-extras tests now skip gracefully on the base Python matrix (they still run end-to-end in the python-all-extras job).
  • Parityprovider="pipeline" is now derived in the TypeScript client when the user passes only llm= without an engine, matching the Python client.

Install

pip install getpatter                      # 0.5.1
pip install "getpatter[anthropic]"         # + Anthropic adapter
pip install "getpatter[google]"            # + Google Gemini
pip install "getpatter[tunnel]"            # + cloudflared for dev tunnels

npm install getpatter                      # 0.5.1
npm install cloudflared                    # optional, only if using tunnel: true

Validation

  • Python: 1350 pytest passed, 8 skipped, 0 failures (1327 baseline + 23 new in tests/unit/test_llm_api.py)
  • TypeScript: 1042 vitest passed across 61 files, tsc --noEmit clean (1013 baseline + 29 new in tests/unit/llm-api.test.ts)
  • 4-line pipeline-mode quickstart with llm=AnthropicLLM() verified in both SDKs

Unchanged

  • Default OpenAI LLMLoop still auto-constructs when llm= is absent and openai_key is present — zero break from 0.5.0.
  • on_message callback still works for custom LLM logic (multi-model routing, local llama.cpp, etc.). Mutually exclusive with llm=.

Links