Skip to content

fix(pipeline): arm barge-in for the whole carrier-buffered playback window#164

Merged
nicolotognoni merged 2 commits into
mainfrom
fix/hermes-bargein-buffered-playback
Jun 10, 2026
Merged

fix(pipeline): arm barge-in for the whole carrier-buffered playback window#164
nicolotognoni merged 2 commits into
mainfrom
fix/hermes-bargein-buffered-playback

Conversation

@nicolotognoni

Copy link
Copy Markdown
Collaborator

Summary

  • Fixes the live Hermes/OpenClaw bug where the agent detects the barge-in but keeps talking: with an agent-runtime LLM the whole (long) reply arrives at once, TTS pushes it to the carrier far faster than realtime, and the carrier keeps playing tens of seconds of buffered audio after the SDK already considers the turn over.
  • The speaking state used to end a fixed PATTER_TTS_TAIL_GRACE_MS (1.5 s) after the last push, not the last playback — so for most of the audible reply _is_speaking was already false, user speech was classified as a calm next turn, send_clear never fired, and the buffered reply played on (with the next turn's answer queued behind it).
  • Token-paced LLMs (OpenAI & co.) stay byte-identical: with no carrier backlog the tail grace starts immediately, exactly as before.

Implementation

  • Estimated playback cursor_playback_buffered_until (Py) / playbackBufferedUntil (TS), advanced by _track_outbound_playback / trackOutboundPlayback on every pipeline TTS chunk pushed to the carrier, at the chunk's real byte rate: PCM16 @ 16 kHz = 32 B/ms (default path, Telnyx native) or carrier-native μ-law @ 8 kHz = 8 B/ms (Twilio/Plivo ulaw_8000).
  • Two-phase _end_speaking_with_grace / endSpeakingWithGrace — phase 1 holds _is_speaking=true with _tail_grace_active=false for the whole estimated backlog, keeping VAD/transcript barge-in armed on the cancel path (full cancel + send_clear, which drops the carrier buffer instantly — Twilio media-stream clear semantics); phase 2 is the unchanged echo-tail grace.
  • Cursor resets on every barge-in cancel path (transcript, VAD-immediate, Python _do_cancel_for_barge_in now also clears the pending grace task — TS already invalidates it via the generation bump) and on endTailGraceForNewTurn.
  • Industry-standard semantics: stop + flush queued audio client-side regardless of LLM state (cf. Pipecat/ElevenLabs interruption handling, Retell stale-turn dropping). Aborting the in-flight request already worked (both Hermes and OpenClaw cancel the server-side run on streaming disconnect); the missing half was the carrier-side audio backlog.
  • Files: libraries/python/getpatter/stream_handler.py, libraries/typescript/src/stream-handler.ts, mirrored unit tests, CHANGELOG.md.

Breaking change?

No. No new config; no default changes. PATTER_TTS_TAIL_GRACE_MS=0 still forces the legacy synchronous flip (tests/soak escape hatch).

Test plan

  • Python: pytest tests/ -m 'not soak' — 2371 passed, 0 failed (incl. new tests/unit/test_pipeline_bargein_buffered.py, 11 tests, RED without the fix: 10/11 fail)
  • TypeScript: npm test — 1835 passed, 0 failed (incl. new tests/unit/pipeline-bargein-buffered.test.ts, 12 tests, RED without the fix: 12/12 fail) + npm run lint + npm run build
  • Parity: identical state machine, naming map (_playback_buffered_untilplaybackBufferedUntil), defaults and tests in both SDKs
  • E2E smoke on a live Hermes call (pipeline/handler change — recommend one manual interrupted-call check before release)

Docs updates

  • N/A (internal behaviour fix; CHANGELOG.md Unreleased → Fixed entry added). Integration docs already describe barge-in as Patter-owned.

🤖 Generated with Claude Code

…indow

Agent-runtime LLMs (HermesLLM/OpenClawLLM) deliver the whole reply at once
after a long thinking pause, so TTS outruns realtime and the carrier ends up
holding tens of seconds of queued audio. The speaking state ended a fixed
1.5s grace after the last *push*, not the last *playback* — for most of the
audible reply isSpeaking was already false, VAD/transcript events were
treated as a calm next turn, send_clear never fired, and the agent 'detected
the barge-in but kept talking'.

Track an estimated playback cursor (_playback_buffered_until /
playbackBufferedUntil) advanced per pushed chunk at its real byte rate
(PCM16@16kHz = 32 B/ms, carrier-native mulaw@8kHz = 8 B/ms), and split
end-speaking-with-grace into two phases: phase 1 holds isSpeaking=true with
tailGraceActive=false for the whole estimated backlog (barge-in stays armed
and takes the full cancel + send_clear path, dropping the carrier buffer);
phase 2 is the unchanged echo-tail grace. Barge-in cancels reset the cursor.

No new config; token-paced LLMs (no backlog) behave identically to before,
and PATTER_TTS_TAIL_GRACE_MS=0 still forces the legacy synchronous flip.
Full Python/TS parity with mirrored unit tests (RED without the fix).
…it-style truncation)

Two gaps with agent-runtime LLMs (Hermes/OpenClaw), building on the
playback cursor from the previous commit:

- Mid-turn barge-in: the whole reply was already synthesized into the
  carrier buffer, so the '[interrupted by caller]' marker was appended to
  the FULL text — a stateful runtime believed the caller heard everything.
- Post-complete barge-in (during the buffered tail): no marker at all —
  history kept the full reply the caller never finished hearing.

Track per-turn (sentence, playback_start) segments at each sentence's
first audible chunk (filler and llm_error_message audio advance the clock
but add no segment), map heard = total_pushed - carrier_backlog to a
sentence-granular prefix, and: (a) the streaming path records
'<heard prefix> [interrupted by caller]'; (b) the barge-in cancel paths
rewrite the last assistant history entry the same way before clearing the
buffer. Legacy full-text marker preserved when no segments were tracked.
Full Python/TS parity with mirrored unit tests.
@nicolotognoni

Copy link
Copy Markdown
Collaborator Author

Added a second commit (24184a0): heard-prefix truncation (LiveKit-style). On any barge-in the history now records only the reply prefix the caller actually heard — estimated from the playback cursor at sentence granularity — instead of the full generated text; barge-ins landing after turn-complete (during the buffered tail) now rewrite the last assistant entry the same way. Filler/llmErrorMessage audio advances the playback clock but is never recorded as reply text. Python 2380 / TS 1843 green locally.

🤖 Generated with Claude Code

@nicolotognoni nicolotognoni merged commit a0ef4c2 into main Jun 10, 2026
10 checks passed
@nicolotognoni nicolotognoni deleted the fix/hermes-bargein-buffered-playback branch June 10, 2026 21:21
@FrancescoRosciano FrancescoRosciano mentioned this pull request Jun 10, 2026
4 tasks
nicolotognoni pushed a commit that referenced this pull request Jun 10, 2026
Opt-in agent.barge_in_mode="pause_resume" (default "cancel" keeps
today's behaviour byte-identical). LiveKit-style state machine on VAD
speech_start while the agent speaks:

- PAUSE: gate the sentence/audio send loops on _output_paused and
  send_clear the carrier so queued audio stops within a frame. The LLM
  stream and TTS provider stream stay alive: sentences buffer as text
  (capped at 32) and synthesized audio queues into per-sentence
  retention entries (capped at ~15 s of playout; overflow while paused
  degrades to a full cancel, overflow while speaking releases retention
  for the turn). Mic audio flows to STT while paused (output is silent,
  so the line is echo-quiet) and the inbound ring is flushed so the
  confirm window can actually hear the user.
- KILL: a committed final transcript (non-echo, non-hallucination,
  non-duplicate — the existing _handle_barge_in/_commit_transcript
  filter family) within barge_in_confirm_ms (default 1500 ms) runs the
  existing _do_cancel_for_barge_in path and discards the paused
  buffers. The overlap window anchored at pause time is preserved so
  InterruptionMetrics.detection_delay measures VAD-T1 -> confirm-T2.
- RESUME: window expires with no confirming transcript -> re-send the
  cleared-but-unheard tail from retained audio at SENTENCE granularity
  (first sentence not fully played, derived from the #164
  _playback_buffered_until cursor + heard-prefix segments; the
  partially-played sentence replays from its start) without re-billing
  TTS, then release the buffered sentences through the normal synth
  path. Recorded as a false interruption via
  record_overlap_end(was_interruption=False) — the backchannel
  counter, never an interruption — plus a false_interruption event.

The playback bookkeeping is frozen at the heard offset on pause so a
kill still rewrites history to the heard prefix; on resume the replay
re-stamps segments so later barge-ins stay accurate. Turn bodies wait
out an in-flight pause decision before ending (bounded by the confirm
window) so buffered sentences are never orphaned.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
nicolotognoni pushed a commit that referenced this pull request Jun 10, 2026
…ipt)

TypeScript port of 3877814 — exact parity with the Python semantics,
defaults, and events (camelCase ↔ snake_case naming):

Opt-in agent.bargeInMode: 'pause_resume' (default 'cancel' keeps today's
behaviour byte-identical). LiveKit-style state machine on VAD
speech_start while the agent speaks:

- PAUSE: gate the sentence/audio send loops on outputPaused and
  sendClear the carrier so queued audio stops within a frame. The LLM
  stream and TTS provider stream stay alive: sentences buffer as text
  (capped at 32) and synthesized audio queues into per-sentence
  retention entries (capped at ~15 s of playout; overflow while paused
  degrades to a full cancel, overflow while speaking releases retention
  for the turn). Mic audio flows to STT while paused and the inbound
  ring is flushed so the confirm window can actually hear the user.
- KILL: a committed final transcript (non-echo, non-hallucination,
  non-duplicate — the existing handleBargeIn/commitTranscript filter
  family) within bargeInConfirmMs (default 1500 ms) runs the existing
  runBargeInCancel path and discards the paused buffers. The overlap
  window anchored at pause time is preserved so detection_delay
  measures VAD-T1 -> confirm-T2.
- RESUME: window expires with no confirming transcript -> re-send the
  cleared-but-unheard tail from retained audio at SENTENCE granularity
  (first sentence not fully played, derived from the #164
  playbackBufferedUntil cursor + heard-prefix segments; the
  partially-played sentence replays from its start) without re-billing
  TTS, then release the buffered sentences through the normal synth
  path. Recorded as a false interruption via recordOverlapEnd(false) —
  the backchannel counter, never an interruption — plus a
  'false_interruption' event ({ resumedSentences }).

The playback bookkeeping is frozen at the heard offset on pause so a
kill still rewrites history to the heard prefix; on resume the replay
re-stamps segments so later barge-ins stay accurate. Turn bodies wait
out an in-flight pause decision before ending — completes the
predecessor's in-progress port by bounding awaitPauseDecision (confirm
window + 5 s fail-open margin, mirroring Python's
_await_pause_decision) so a teardown race can never strand the
dispatch loop.

Tests mirror tests/unit/test_barge_in_pause_resume.py: pause gates
without cancelling, paused buffering + overflow degradation, resume
tail replay + false-interruption metrics/event, kill filters
(final-only / hallucination / duplicate / frozen-prefix history
rewrite), legacy cancel mode untouched, config-off defaults,
streaming-loop integration (resume, kill, stream-ends-paused), and
teardown mid-pause.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
FrancescoRosciano added a commit that referenced this pull request Jun 11, 2026
…mantic EOU + review-wave fixes (both SDKs) (#169)

* fix(providers): add missing Union import in 8 provider modules

Several STT/TTS/realtime provider option classes reference Union[...] in
type annotations but never import it. `from __future__ import annotations`
masked the omission at import time, but typing.get_type_hints() and other
runtime annotation introspection (Pydantic, docs tooling, inspect with
eval_str=True) raised `NameError: name 'Union' is not defined`.

Affected: assemblyai_stt, cartesia_stt, soniox_stt, whisper_stt, rime_tts,
lmnt_tts, gemini_live, ultravox_realtime. Python-only fix (TS unaffected).

https://claude.ai/code/session_01Nrb3ZoVFc6K4v1asd2jN8P

* fix(llm): surface HTTP errors from non-OpenAI providers (TS)

The TypeScript Anthropic/Google/Groq/Cerebras providers returned silently
on a non-2xx LLM response instead of throwing. Two regressions followed:

  - FallbackLLMProvider treated a generator that completed with zero
    chunks as success, so it never failed over to the next provider.
  - The stream handler only speaks `agent.llmErrorMessage` when the LLM
    loop throws, so a silent return produced dead air on the call.

Python (anthropic/google via vendor SDKs, groq/cerebras via the openai
SDK) already raises on HTTP errors, and the TS OpenAI provider already
throws PatterConnectionError — these four were the outliers. Make them
throw PatterConnectionError too, and cap the logged/thrown error body to
200 chars (provider 401 bodies have been observed to embed the rejected
API-key prefix).

Updates the two Cerebras tests that asserted the old silent-drain
behaviour to expect the throw while still verifying the recovery-hint log.

https://claude.ai/code/session_01Nrb3ZoVFc6K4v1asd2jN8P

* fix(python): resolve undefined names in public type annotations + export gaps

- client.py: PipelineHooks/ConsultConfig/CallResult/RealtimeTurnDetection
  (models) and VADProvider/AudioFilter/BackgroundAudioPlayer (providers.base)
  were referenced in Patter.agent()'s signature but never imported — IDEs and
  typing.get_type_hints() raised NameError on the SDK's main entry point.
  Tool/SpeechEventCallback move from TYPE_CHECKING to runtime imports (no
  cycle), so get_type_hints(Patter.agent) now fully resolves.
- models.py: BargeInStrategy added to the TYPE_CHECKING block (same bug).
- google_llm.py: missing Union import (companion to the earlier 8-module
  fix), drop dead api_key local, unshadow call_id loop variable.
- __init__.py: 53 provider option enums were re-exported but missing from
  __all__ (import * / doc tooling missed them); stt/tts package __all__
  gain openai_transcribe, elevenlabs_ws, inworld.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(llm): barge-in/tool-dispatch safety, provider protocol fixes, idle stream timeout

Python:
- llm_loop: when cancel_event fires mid-stream every provider returns
  cleanly, leaving truncated tool-call JSON accumulated — the loop then
  executed those tools with {} arguments after the caller interrupted
  (transfer/SMS/booking firing with empty payloads). Bail out before tool
  dispatch on cancel, and answer malformed-JSON tool calls with an error
  envelope instead of executing with guessed arguments.
- stream_handler/test_mode: history was snapshotted AFTER pushing the
  current user turn while LLMLoop._build_messages appends user_text
  itself — every request carried the user utterance twice.
- cerebras: 404 model_not_found was swallowed (empty stream looks like
  success → no fallback failover, no spoken llm_error_message, dead air).
  Now logs the recovery hint and re-raises, mirroring TS; test updated.
- anthropic/google: prepend a synthetic user turn when history starts
  with the first_message greeting (Messages API requires user-first;
  Gemini same shape), map Gemini functionResponse.name back to the real
  function name via the paired functionCall (spec requires the names to
  match), subtract cached tokens from Gemini input usage.
- chat_context: to_anthropic folded role:"tool" entries into user turns
  (Anthropic 400s on tool role); truncate drops leading orphan tool
  results (bare tool_call_id 400s on OpenAI).
- fallback_provider: forward caller/callee to delegates and only pass the
  context kwargs each delegate's stream() declares — a minimal custom
  provider no longer TypeErrors on every attempt (availability flapping).

TypeScript (mirrors where applicable):
- replace the fixed 30 s whole-stream LLM ceiling with an idle watchdog
  (createStreamIdleWatchdog, re-armed per chunk) in OpenAI/Anthropic/
  Google/Groq/Cerebras/OpenAI-compatible providers; idle aborts now throw
  PatterConnectionError instead of surfacing as a fake barge-in AbortError
  (parity: Python has no whole-stream ceiling).
- anthropic: handle in-band SSE error events (overloaded_error) by
  throwing instead of ending the stream as success; user-first guard.
- google: user-first guard, functionResponse.name mapping, cached-token
  subtraction. groq/cerebras shared parser: subtract cached tokens from
  prompt_tokens (was double-billing cache reads).
- chat-context: same to_anthropic/truncate fixes as Python.
- tests updated to the new contracts + new watchdog unit tests.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(dashboard): ingest crash, phantom live rows, SSE freeze, parity gaps

- store.py: record_call_end crashed with TypeError when the standalone
  ingest passed metrics as a plain dict (asdict on non-dataclass) — and
  the exception fired AFTER the active row was popped, so every completed
  call vanished from the standalone dashboard at hangup. Accept dicts.
- store.py: update_call_status now copies the live transcript/turns into
  the terminal entry (TS already did) — the Twilio statusCallback vs WS
  stop race no longer blanks the transcript pane.
- both stores: add Plivo 'timeout'/'cancel' to the terminal status set —
  rows for unanswered/cancelled Plivo dials leaked in the active set
  forever (phantom live call).
- both servers: Telnyx call.hangup with a no-media cause (busy/no-answer/
  rejected) now terminal-izes the pre-registered dashboard row — same
  permanent active-set leak.
- store.py SSE: a force-dropped slow subscriber now receives a close
  sentinel so its generator ends and EventSource reconnects — previously
  the dashboard froze forever while showing 'streaming · sse'.
- cli ingest (both SDKs): a finished-call payload is no longer replayed
  as a fresh call_start (spurious SSE event + started_at = ingest time);
  stores derive started_at from the metrics duration when absent.
- cli.ts: raise express.json body cap to 5 MB (long-call ingests 413'd
  and silently vanished).
- api_routes.py: /api/v1/calls/{id} falls back to the active set (TS
  parity). routes.py: clamp negative ?limit; interpret date-only export
  filters as UTC like JS Date (same query returned different ranges per
  SDK).

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(telephony/server): Telnyx outbound media, raw Ed25519 keys, Plivo wait, WS hardening

Cross-carrier correctness fixes confirmed by the deep review (several found
independently by two reviewers):

- Telnyx outbound calls never got a media stream in EITHER SDK: a perf
  refactor folded streaming_start into actions/answer on call.initiated,
  but Answer is only valid on incoming legs and call.answered had become a
  no-op — callees answered to dead air. Outgoing legs now skip the answer
  and attach the stream via actions/streaming_start on call.answered.
- Telnyx webhook signature validation only accepted DER/SPKI public keys,
  but the Telnyx portal issues TELNYX_PUBLIC_KEY as base64 of the RAW
  32-byte Ed25519 key — every webhook 403'd (fail-closed) the moment the
  documented security feature was enabled. Both forms now verify; tests
  cover the raw form.
- Plivo call(wait=True) could never resolve: completions/AMD/prewarm were
  keyed by the dial-time request_uuid while every webhook carries the live
  CallUUID. The answer webhook now re-keys all per-call bookkeeping
  (alias_call_id / aliasCallId + client prewarm re-key); the TS Plivo
  branch also actually routes through maybeAwaitCompletion (wait was
  silently ignored).
- TS carrier WS: no 'error' listener (an ECONNRESET became an
  uncaughtException killing every live call), unguarded async 'close'
  listeners (throwing onCallEnd → unhandled rejection → crash), and ws@8
  invoking async listeners unawaited (interleaved handleAudio → VAD state
  races, out-of-order STT). All three carrier streams now serialize events
  onto a per-connection FIFO with contained errors + error listeners.
- Per-IP WS cap counted the tunnel's loopback peer: hard ceiling of 10
  concurrent calls behind cloudflared/ngrok and a trivial shared-bucket
  DoS. Loopback peers now key on CF-Connecting-IP / X-Forwarded-For.
- Voicemail drops (Telnyx/Plivo, both SDKs) were awaited inline in webhook
  handlers including a playback sleep of up to 30 s — carriers timed out
  and retried, double-speaking the message. Now tracked fire-and-forget
  tasks; the Telnyx drop also moves from the early
  call.machine.detection.ended to call.machine.greeting.ended (the beep),
  so the message is no longer clipped mid-greeting; playback estimate
  constants aligned (were 2x apart between SDKs).
- machine_end_other now triggers voicemail-drop/prewarm-evict like the
  other machine_end_* outcomes (both SDKs).
- Telnyx configure_number PATCHed connection_id to /phone_numbers/{id}/voice
  which silently ignores it (auto-config 'succeeded' but inbound never
  routed) — association now goes to PATCH /phone_numbers/{id} (all 3 impls).
- Python: completion futures resolve in finally around user on_call_end
  (a throwing callback stranded wait=True until the 30-min backstop; TS
  same); serve() no longer crashes on Windows (add_signal_handler);
  call(from_number=...) was always ignored (config value won the or);
  webhook_url now normalised to a bare hostname (schemed values built
  wss://https://... URLs); outbound Telnyx/Plivo dials leaked a pooled
  httpx client per call; bridges resolve direction from the store instead
  of hardcoding 'inbound'; handler.cleanup() guarded in all three bridge
  finallys; WebSocketDisconnect no longer logged/recorded as a call error;
  Plivo bridge masks phone numbers in logs and sends the same
  transcript/conversation_history payload shape as Twilio/Telnyx.
- TS: recording: true now actually starts Plivo recording (worked in
  Python only).

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(telephony): telnyx on_call_end reads conversation_history from the handler

The bridge keeps no history deque of its own (Twilio/Plivo do) — the
parity addition referenced an undefined name, which the on_call_end
try/except silently swallowed, skipping the callback entirely.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(pipeline): core orchestrator correctness — both SDKs

Python stream handler:
- transcript/history deques: 'x or deque()' silently replaced the
  carrier-shared (empty → falsy) deques with private ones, so EVERY
  on_call_end payload carried an empty transcript and history. Use
  'is not None'.
- STT-connect-failure hangup called _hangup_fn(call_id) but the carrier
  hangup closures take no args — the TypeError was swallowed and the call
  stayed up, deaf.
- apply_call_overrides round-tripped the Agent through dataclasses.asdict,
  dict-ifying nested configs and deep-copying live provider objects — any
  per-call override from on_call_start crashed the call later. Use
  dataclasses.replace.
- _await_dispatch_settle: dispatch-turn failures were logged at DEBUG
  (callers heard silence, operators saw nothing) and CancelledError was
  swallowed even when the awaiting task itself was being cancelled,
  defeating teardown. cleanup() now also cancels the STT loop BEFORE the
  dispatch task so a racing transcript can't respawn an orphan turn, and
  guards each adapter close individually.
- user on_transcript/on_metrics callbacks are now exception-contained
  (_safe_on_* helpers): one raise inside the realtime forward loop
  permanently killed event forwarding (zombie call).
- mcp_servers were silently ignored in pipeline mode (only the realtime
  handler called _init_mcp_tools); pipeline start() now discovers MCP
  tools and cleanup() closes the sessions — matching the documented
  mode-agnostic contract and TS.
- realtime function_call: unknown/handler-less tools and malformed
  argument JSON now get an error-envelope function_result instead of
  silence (a dangling call item stalled the model: dead air). Mirrors TS.
- pipeline transfer_call validates E.164 BEFORE invoking the carrier
  transfer (which silently no-ops on bad targets) and returns the same
  rejection envelope as the realtime path.
- realtime guardrails: evaluate on accumulated text (per-delta checks
  never matched terms split across deltas), clear the carrier playout
  buffer on block, and speak the replacement via the no-fake-turn
  reassurance path — send_text injected it as a phantom role:user turn
  the model then replied to.
- barge-in: echo guard now runs BEFORE the tail-grace rescue (the grace
  window is exactly when the agent's final-sentence echo arrives — the
  rescue disarmed the downstream echo check and the agent answered its
  own words); duplicate/hallucination finals are filtered BEFORE
  cancelling (Deepgram's is_final twin of a just-committed speech_final
  cancelled the agent's brand-new turn); a strategy-confirmed barge-in
  now actually flushes the inbound ring, and the pending window forwards
  audio to STT — with strategies configured but forward-stt off, no
  transcript could ever arrive, so strategy barge-in was structurally
  impossible.
- firstMessage: history append no longer gated on metrics being enabled
  (model could re-greet), echo-guard reference now covers the greeting
  and non-streaming replies, prewarm pacing derives bytes/ms from the
  active output format (mulaw 8k prewarm bytes were paced 4x too fast,
  re-opening the barge-in flush window).
- STT send_audio failures degrade to dropped frames (rate-limited warn)
  instead of tearing the whole call down via the carrier read loop.
- remote_message: the 30 s asyncio.timeout spanned the generator's whole
  consumption INCLUDING TTS playback time of each yielded chunk — long
  spoken replies were cancelled mid-sentence with no log. Now a
  per-receive idle timeout.
- services: IVR loop detector compares the newest chunk to its immediate
  predecessor (max-over-window false-fired on alternating A/B prompts);
  scheduler cache stores (loop, scheduler) so a reallocated id() can't
  hand back a scheduler bound to a dead loop; markdown filter no longer
  eats all prose after a bare '<'.

TypeScript stream handler (mirrors + TS-specific):
- dispatchTask gets its rejection handler AT creation (dispatchTurn is
  try/finally only; the next turn's catch attached far too late for
  Node's unhandled-rejection check → process crash). fireCallEnd guards
  the user onCallEnd; processTranscript guards the user onTranscript.
- same echo-guard-before-tail-grace reorder; same firstMessage /
  runRegularLlm / WS-remote echo references; runRegularLlm returns its
  final text instead of the caller re-reading history[-1] (raced a
  concurrently committed user turn).
- WS-remote turns now honour barge-in at the outer loop (previously kept
  consuming the remote stream and started a fresh TTS synthesis per
  chunk after the interrupt) and only bill TTS/turn-complete on a clean
  finish. remote-message drains buffered frames after done/close (the
  old !done condition dropped every buffered chunk after the first).
- LLM tool loop: bail out before/between tool executions when the abort
  signal fired (parity with the Python cancel_event fix — no more side
  effects from truncated tool JSON after a barge-in).
- speechEvents threaded into StreamHandlerDeps (the public
  onUserSpeechStarted/.../onAudioOut API never fired on real served
  calls — only unit tests passed it).
- scheduler.scheduleOnce chains timeouts past Node's 2^31-1 ms clamp (a
  >24.8-day job fired immediately); IVR note*State respect stop();
  test-mode REPL survives provider errors.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(stt/audio): callback containment, sequential reuse, resampler boundary correctness

STT (TypeScript): every provider emit loop now contains BOTH sync throws
and async rejections from transcript callbacks — the registered callback
is async, so the bare cb(t) (or sync-only try/catch in Deepgram/
Speechmatics) left its rejection unhandled and killed the Node process on
any user-callback error.

STT (Python):
- whisper/telnyx/speechmatics adapters fully reset per-call state on
  connect(): close() left a closed httpx client, an unsent WAV-header
  flag and stale None/_STOP sentinels behind, so a sequential second call
  on the same adapter instance was deterministically broken (no
  transcripts / instant loop exit / rejected audio).
- whisper transcriptions are now chained sequentially (both SDKs had the
  ordering bug; Python fixed here): parallel HTTP requests with OpenAI's
  latency variance routinely delivered chunk N+1's final before chunk N,
  scrambling word order in history.
- AssemblyAI reconnect: _running stayed true through the reconnect
  handshake — the consumer polls it every 100 ms and the TLS+WS handshake
  takes longer, so a successfully reconnected (billed) session delivered
  zero transcripts.
- Soniox (both SDKs): add finalize() ({type:'finalize'}) so the VAD
  speech_end fast-path actually works (every turn previously waited out
  the full endpointing delay), and stop re-emitting identical interims on
  token-less keepalive frames.
- OpenAITranscribeSTT (both SDKs): reject verbose_json up front — the
  gpt-4o transcribe models 400 on it, so every chunk failed (logged only)
  while audio kept being buffered and billed.
- deepgram: Transcript.words back to a tuple (frozen-dataclass contract);
  providers.deepgram() helper smart_format default aligned with the class
  (the two entry points behaved differently); providers.soniox(language=)
  now maps to language_hints instead of being silently discarded.

Audio:
- StatefulResampler.flush() (py) fed the partial-frame carry to ratecv,
  which ALWAYS raises on a non-whole frame — every odd-length stream
  crashed the flush path. Drop the sub-frame remainder like TS.
- TS 16k→8k FIR decimator rewritten with a real lookahead carry: the old
  single-pending-sample design processed the carried sample twice (lost
  the true s-2) and edge-replicated the +2 tap at every chunk end —
  audible crackle at chunk boundaries on the main Twilio outbound path.
  Chunked output is now bit-identical to one-shot output (regression
  tests added).
- AEC far-end taps (3 py + 2 ts sites) gated on the carrier-native fast
  path: with the TTS adapter auto-flipped to ulaw_8000 they pushed mulaw
  wire bytes into an int16-PCM-16k echo canceller — garbage reference,
  and odd-length chunks crashed np.frombuffer mid-turn (misreported as an
  LLM error).
- Silero VAD (py): queue transitions beyond the first per process_frame
  instead of dropping them (a chunk spanning speech_end→speech_start lost
  the start event); reset() clears the queue.
- background_audio builtin_clip_path returned a path whose as_file
  context had already exited — on zip-based installs the extracted temp
  file was deleted before use. Keep the context open for the process
  lifetime (same pattern as silero_onnx).

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(tools): timeout defaults, breaker probe gating, schema generation, Gemini sanitization

- tool executor (py): handler tools (which is what every MCP tool is) ran
  UNBOUNDED when no per-tool timeout was declared — a hung tool froze the
  realtime event loop indefinitely. Apply the documented 10 s default
  (mirrors TS); webhook timeouts are now terminal like handler timeouts
  (retrying multiplied a dead webhook's wait to many minutes per turn);
  non-JSON-serializable returns no longer burn the whole retry+backoff
  loop (default=str).
- consult (both SDKs): the consult tool now declares its own timeout
  budget — TS DefaultToolExecutor raced the handler against the 10 s
  default and killed any consult longer than that, while the handler's
  own budget was 30 s; the Python tool needed the declaration for the new
  executor default.
- circuit breaker (py): HALF_OPEN admitted unlimited concurrent probes
  (comment said one) — a burst of parallel tool calls hammered a
  recovering backend. Gate with probe_in_flight like TS.
- @tool schema generation (py): PEP 604 unions (str | None) have origin
  types.UnionType, not typing.Union — the idiomatic 3.10+ spelling mapped
  to {type: object} and was wrongly marked required. Literal[...] now
  emits enum, list[X] emits items (Gemini rejects array schemas without
  items).
- define_tool (py) returned a plain dict that Patter.agent(tools=[...])
  rejects with TypeError since 0.5.0 — now returns the public Tool.
- Gemini schema sanitization (google_llm.py, gemini_live.py,
  google-llm.ts, gemini-live.ts): recursively strip JSON-Schema keys the
  proto Schema rejects ($schema, additionalProperties, oneOf, …) —
  strict-mode tools REQUIRE additionalProperties:false and nearly every
  zod-derived MCP server emits $schema, so one such tool 400'd every
  Gemini turn/session.
- MCP (ts): transport throws from callTool now return the structured
  error envelope instead of reaching the executor's retry loop, which
  re-fired non-idempotent MCP tools up to 3x on transient errors (parity
  with Python).

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(engines): GA listener leak, ConvAI wiring, truncate targets

- openai-realtime-2.ts: the connect() monkeypatch registered every message
  listener as an anonymous wrapper, so removeListener (identity match)
  never removed the setup listener — it stayed attached for the whole call
  and its error branch ran ws.close() on the FIRST benign mid-call error
  frame (commit-empty, truncate-too-short, parallel tool-result
  conversation_already_has_active_response), tearing down the live engine
  socket. The patched on() now keeps a handler→wrapped map and off()
  translates through it; the setup listener also hard-ignores messages
  once settled.
- ElevenLabs ConvAI (TS): the adapter hardcoded language 'it' (every
  conversation forced to Italian or failing initiation) and always sent a
  voice_id override (ElevenLabs rejects overrides not enabled in the
  agent's security settings — broke default-configured agents). Overrides
  are now sent only when explicitly configured. buildAIAdapter switches
  to the options form with ulaw_8000 in/out — the positional form sent no
  output_format, so ConvAI streamed PCM16@16k onto the mulaw carrier wire
  (loud static on every TS ConvAI call).
- ElevenLabs ConvAI (py): the Telnyx bridge built the handler without
  for_twilio=True even though Telnyx negotiates PCMU 8 kHz — caller mulaw
  bytes were fed to ConvAI as PCM16@16k (garbled in both directions;
  Twilio/Plivo branches were already correct).
- OpenAI Realtime (both SDKs, v1+GA): response.output_item.added also
  fires for function_call items — recording those as the truncate target
  made barge-in during a tool turn truncate a non-message item, which the
  server rejects with an error event. Only message items are tracked now.
- OpenAI Realtime GA (both SDKs): the GA session schema removed
  'temperature' — forwarding it made the session.update fail and the call
  drop at pickup. Warn-and-skip.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(observability): GA cost attribution, metric correctness, telemetry opt-out hygiene

- cost: openai_realtime_2 calls fell through to the pipeline branch of
  _compute_cost in BOTH SDKs (exact-string match on 'openai_realtime') and
  reported $0 AI cost for the most expensive engine — while still emitting
  cached savings. TS also never captured realtimeModelName / the realtime
  provider tag for GA calls.
- agent_response_ms (py): llm_ttft_ms is initialised to 0.0 so the
  'is not None' gate was always true — when no first-token signal fired
  the flagship SLO metric silently EXCLUDED the whole LLM segment. Now
  gated on the actual signal (TS already leaves it undefined).
- EOU delay (py): the final-transcript path stamped record_vad_stop
  unconditionally, overwriting the real VAD speech_end stamp microseconds
  before stt_final — end_of_utterance_delay was always ≈0 and the fake
  endpoint signal defeated record_stt_complete's own don't-fake logic.
  The fallback stamp is now first-wins.
- InterruptionMetrics units: Python emitted SECONDS, TS milliseconds —
  cross-SDK consumers were 1000x apart for the same event. Python now
  emits ms (every other latency field is *_ms); TS gains Python's
  early-return so stray overlap-ends no longer inflate interruption
  counts. Docstring + regression test updated.
- call-log (ts): logTurn/logEvent/logCallEnd re-derived the day directory
  from Date.now(), so calls crossing midnight UTC split across two day
  dirs — the original metadata stayed 'in_progress' forever and the
  dashboard hydrate resurrected phantom live calls. Per-call startedAt
  map added (mirrors Python).
- telemetry opt-out (both SDKs): the environment-dims helper ran
  unconditionally at construction and its previousVersion probe WROTE
  ~/.getpatter/version — violating the documented 'opting out never
  touches the filesystem' invariant. Now gated on enabled. Numeric
  dimensions (latency_ms/cost_usd/…) now require numbers — the one gap
  that let free text reach the wire.
- pricing (both SDKs): gpt-4.1 $3/$12 → $2/$8 and gpt-4.1-mini
  $0.80/$3.20 → $0.40/$1.60 (published OpenAI rates; siblings were
  correct); gpt-4o-realtime-preview audio still carried the Oct-2024
  launch price ($100/$200) — cut to $40/$80 in Dec-2024.
- evals (py): one transient judge failure (429/timeout/missing key)
  aborted the whole suite and discarded every completed case — now
  recorded as a failed case; the verdict is computed locally from the
  score instead of trusting the judge's self-reported 'passed' (a
  hallucinated passed:true with score 0.2 recorded a pass).
- observability exports (ts): shutdownTracing/withSpan/recordPatterAttrs/
  patterCallScope/attachSpanExporter were not exported from the package
  root — users could not flush the BatchSpanProcessor Patter creates
  (NodeTracerProvider does not flush on exit), silently dropping trailing
  spans.
- minor parity: Python _opt_avg now filters zeros like TS optAvg; TS
  recordTtsFirstByte emits only inside the first-byte latch (re-emitted
  stale TTFB events); stale 'masked by default' phone-redaction
  docstrings corrected in both SDKs.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(tts): Telnyx wire format, Cartesia 8k double-decimation, chunker/facade parity

- ElevenLabs 'native format' for Telnyx was declared pcm_16000 across the
  TTS layer (both SDKs: native-format maps, for_telnyx factories, and the
  stream handlers' native checks) while the SDK's own streaming_start pins
  the Telnyx wire to PCMU/μ-law @ 8 kHz — the native fast path therefore
  shipped raw PCM16 bytes onto a μ-law wire: pure static on every default
  ElevenLabs-on-Telnyx call. Every surface now agrees on ulaw_8000 (the
  TS handler check also gates on known carriers); tests updated to the
  corrected contract.
- CartesiaTTS.for_twilio/forTwilio (and the pipeline facades, both SDKs)
  requested sample_rate=8000, but the audio sender has no consuming hook
  for a declared TTS rate — it unconditionally runs its fixed 16k→8k
  decimator, so the 8 kHz audio was decimated AGAIN and played at ~2x
  speed (chipmunk) on every call using the documented factory. The
  factories now emit 16 kHz (the pipeline rate).
- sentence chunker (both SDKs): a standard-path emission now ends the
  aggressive 'first flush' window — only the aggressive flush cleared the
  flag, so a comma in sentence 2+ could still trigger a clause-level
  flush mid-turn (choppy prosody, contradicting the documented contract).
- tts/openai.py facade default aligned to gpt-4o-mini-tts (the underlying
  provider default and the TS facade) — the same nominal config produced
  different voice/latency/price per SDK.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* test(tts): update remaining Cartesia pipeline-facade tests to the 16 kHz contract

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(parity): restore the cross-SDK parity suite; fail fast on missing OpenAI key in TS agent()

The parity runner still referenced the pre-monorepo layout (sdk/, sdk-ts/,
package 'patter') — every one of the 10 scenarios failed, so the suite had
silently stopped guarding parity. Restored:
- paths/imports updated to libraries/python + libraries/typescript/dist
  and the getpatter package (tool_executor moved to tools/).
- call_init / voice_mode_enum rewritten for the modern carrier-object API
  (cloud mode and DEFAULT_*_URL report 'removed' on both sides).
- the TS shim silences the telemetry banner and the runner parses the last
  stdout line, so SDK construction output can't corrupt the JSON protocol.
- sentence_chunker delegates to its dedicated standalone runner (xfail
  semantics) instead of counting as a failure.

The revived suite immediately caught a real divergence: Python fails fast
in agent() when OpenAI Realtime mode has no API key, TS deferred to call
time (dead call instead of a clear construction error). TS now validates
eagerly too; test helpers gain stub keys.

Suite result: 10/10 scenarios matched (sentence_chunker: 53 pass, 8 xfail).

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(dashboard-app): live time window, ringing status, deletion tombstones — rebuilt ui.html

- App.tsx: the bucket strategy captured Date.now() once and was memoized on
  [range] only, so the time window froze at mount — a dashboard left open
  past the window edge silently dropped every call that ended after it
  (within at most 1 hour on the default 24h view). The window now re-anchors
  on a 30 s tick.
- mappers: Twilio 'ringing'/'queued' statuses mapped through the default
  branch to 'ended' — every outbound call showed an "ended" pill for the
  whole 10-30 s ring phase. They now map to the (already styled, previously
  unused) 'queued' pill and count as ongoing.
- mergeCallPreserving resurrected soft-deleted calls forever: deletions are
  absent from the server snapshot by design, and the prev-carry-over loop
  re-appended them on every refresh (cross-tab deletes never propagated).
  The calls_deleted SSE payload and local deletes now feed a tombstone set
  consulted by the merge.
- turnCount transcript fallback halves the line count (one line per user
  AND assistant message double-counted turns past the percentile gate);
  'All' range sparkline now actually derives its window from data extents
  (the {fromMs: 0} sentinel was truthy, so every call landed in the
  rightmost 1970→now bucket).

dist rebuilt and synced into both SDKs' dashboard/ui.html (identical md5).

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(pipeline): play the greeting as a background task — both SDKs

The firstMessage streamed INLINE: in Python inside PipelineStreamHandler
.start(), which the carrier bridge awaits from its single WS read loop —
so for the whole greeting no media frames were processed (VAD/barge-in
structurally impossible on the first message), stop frames went unnoticed,
and prewarmed mark-gated pacing starved because mark acks could never be
read (0.5 s timeout per chunk → ~13x slower than realtime, guaranteed
jitter underrun). The TS handler had the same shape inside handleCallStart,
made symmetric by the recent per-connection FIFO serialization.

Both handlers now await beginSpeaking(is_first_message=true) BEFORE
returning (the self-hearing guard engages from the very first inbound
frame) and stream the greeting in a tracked background task
(_play_first_message / playFirstMessage). Teardown cancels (py) /
settles (ts) the task before adapters close; failures log instead of
killing call setup.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(stt): clone the configured adapter per call — both SDKs

STT adapters are stateful per-connection objects, but the documented usage
hands ONE instance to ONE agent served for MANY calls: concurrent calls
shared a socket/queue/callback set, so call B's connect() overwrote call
A's WebSocket, each call could receive the OTHER caller's transcripts, and
the first hangup closed the surviving call's socket.

Python: STTProvider.__init_subclass__ now captures every subclass's
ORIGINAL constructor arguments (outermost call wins through inheritance
chains, zero per-provider code — user subclasses included) and a generic
clone() replays them; _create_stt_from_config clones provider instances
per call, degrading to the legacy shared instance with a loud warning when
clone() fails. Verified across all 7 streaming providers.

TypeScript: each provider records its construction args and exposes
clone(); createSTT() clones per call with the same loud-warning fallback
for adapters without clone().

The identity test updated to the new contract (same type/config, fresh
connection state, distinct per call).

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(api): persist default parity, per-call first_message, TS onCallStart overrides

- persist (TS): both SDKs' docs state persistence is ON by default since
  0.6.2 (the dashboard hydrate path needs on-disk records across restarts),
  but resolvePersistRoot had regressed to opt-in — with persist omitted and
  no PATTER_LOG_DIR, TS silently wrote nothing while Python persisted.
  Aligned to the documented default.
- call(first_message=...) (py) was documented but never referenced in the
  body — now applied as a per-call frozen-dataclass copy of the agent so
  prewarm synthesis, the bridge and the handler all see the override. TS
  gains the same option (LocalCallOptions.firstMessage) for parity.
- onCallStart per-call overrides (TS): Python has applied a dict returned
  from on_call_start as per-call agent config since 0.5.x; TS typed the
  callback void and ignored the result. The handler now applies returned
  overrides (snake_case keys mirroring apply_call_overrides; the
  Python-only stt_config/tts_config keys warn-and-skip), the server's
  logging wrapper forwards the return value instead of swallowing it, and
  the public callback types accept the override shape.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(audio/core): AEC far-end staleness + adaptation floor; 1h max-call watchdog in Python

- NLMS AEC (both SDKs): the far-end ring only advances on push, so while
  the agent was silent processNearEnd convolved the SAME frozen TTS tail
  into every 20 ms user frame forever — a repeating buzz at echo-estimate
  amplitude superimposed on user speech, exactly when there is no echo to
  cancel. The canceller now passes through when the reference is stale
  (>250 ms since the last far-end push). The adaptation floor also rises
  from 1e-6 (-120 dBFS — a TTS fade-out still 'counted' as far energy,
  letting weights blow up against user speech with a near-zero norm) to
  1e-3 (≈ -60 dBFS), freezing adaptation on an effectively-silent
  reference.
- Python stream handlers gain the 1-hour auto-hangup watchdog TS has had
  all along (armed in each start(), cancelled in cleanup()) — a call whose
  carrier stop never arrives could previously run and bill forever.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(engines): ConvAI client tools, Gemini Live transcriptions, Ultravox event semantics

- ElevenLabs ConvAI (both SDKs): handle the previously-ignored
  client_tool_call event — configured client tools stalled until the
  provider-side timeout and reported failure. The adapters surface it as
  the shared function_call event and gain a client_tool_result sender;
  the handlers route through the tool executor and ALWAYS answer
  (unknown tools and execution errors included — silence stalls the
  ElevenLabs agent). transfer_call/end_call declared as ElevenLabs client
  tools now reach the carrier helpers.
- Gemini Live (both SDKs): enable input/output audio transcription in the
  session config and parse serverContent.inputTranscription/
  outputTranscription — native-audio sessions previously produced NO user
  transcript ever and no assistant transcript in AUDIO modality
  (logs/history/metrics empty for every Gemini Live call). goAway is now
  logged loudly (the only warning before the server drops the ~10-15 min
  session).
- Ultravox (both SDKs): 'listening' is entered after EVERY normal agent
  turn — mapping it to speech_started cleared the carrier playout buffer
  at each turn end, clipping the audio tail; turn end is now the
  speaking→listening transition ('idle' never fires mid-call, so
  response_done effectively never fired before). Agent transcripts emit
  delta frames only — full-text frames forwarded as appends duplicated
  the transcript ('Hel'+'lo'+'Hello').

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(observability): pipeline speech events end-to-end + events.jsonl operational log

- Thread the SpeechEvents dispatcher through all three Python carrier
  bridges into every handler; ConvAI and Pipeline handler ctors now
  accept/forward speech_events like the Realtime handler already did.
- Emit the documented user/agent speech events from pipeline mode in
  both SDKs (VAD start/stop, EOS on turn commit, agent begin/end with
  interrupted=true on barge-in) — previously realtime-only.
- Write events.jsonl (documented since 0.6 but never written):
  tool_call/tool_result records from role=tool transcript lines,
  barge_in from interrupted turns, error from CallMetrics.error_code
  (also persisted as metadata.json "error") — wired in both servers'
  logging-callback wrappers, with unit tests reading the files back.
- Update stale test contracts surfaced by the full-suite run: remote WS
  mock gained recv() (per-receive idle-timeout loop), transfer_call now
  rejects non-E.164 before invoking the carrier helper, guardrail
  replacement speaks via send_reassurance instead of send_text.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* feat(evals): EvalSession — eval harness that drives the REAL pipeline call loop

The existing eval runner scored an arbitrary reply(text) -> str callable
and never exercised the SDK. This adds getpatter.evals.session.EvalSession,
which constructs an actual PipelineStreamHandler and injects user turns
through the same path a live call uses — the real STT receive loop
(_handle_barge_in -> _commit_transcript -> _dispatch_turn), the real
LLMLoop with the real ToolExecutor, pipeline hooks, guardrail replacement,
dedup/hallucination filtering, history handling, and metrics — with only
the paid/external boundary faked:

* FakeAudioSender (records send_audio/send_clear/send_mark, auto-acks marks)
* FakeSTT (queue-backed; finals flow through the handler's real _stt_loop)
* FakeTTS (records spoken sentences, yields 10 ms of silence)
* ScriptedLLMProvider (deterministic chunk scripts for CI) or any real
  LLMProvider for live evals

API: `async with EvalSession(agent=..., llm_provider=...) as s:` then
`result = await s.user_says("...")` -> frozen TurnResult(agent_text =
what the caller heard post-guardrails/hooks, tool_calls, history_snapshot,
interrupted, metrics_turn). getpatter.evals.assertions.expect(result) adds
chainable tool_called(name, args_subset=) / no_tool_called() /
agent_text_contains(...) and an async judge(llm_judge, intent=...) that
reuses the existing LLMJudge.

EvalCase gains optional agent= / llm_provider= fields; EvalRunner routes
those through EvalSession while the legacy reply()-factory path (and the
`patter eval` CLI, which keeps that contract) is unchanged — both flavours
mix in one suite. stream_handler.py is untouched: the session drives
existing public/internal methods only.

Tests (no network, scripted provider) prove: (a) a tool-call case asserts
via tool_called with the REAL ToolExecutor running a local handler, (b)
multi-turn history accumulates with prior turns exactly once (pinning the
existing cross-SDK trailing duplicate of the current user message), (c)
guardrail replacement is observable in agent_text, (d) cleanup leaves no
pending tasks (asyncio.all_tasks comparison), plus commit-filter drops
surfacing loudly, hooks, metrics capture, and runner integration.

Python-only by design (the TypeScript CLI already prints an evals stub).

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* docs: pipeline-stages decomposition design (InputChain / TurnManager / OutputChain)

Proposes splitting PipelineStreamHandler's three interleaved state machines
into composable stages, inventories today's states/transitions and maps every
PipelineHooks surface and recently-fixed bug to its owning stage, and defines
a 4-slice migration plan that keeps the public API and all existing tests
green. Slice 1 (InputProcessingChain + audio_filter wiring) lands separately.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix: wire agent.audio_filter into the pipeline via an extracted InputProcessingChain (slice 1)

audio_filter / audioFilter (Krisp / DeepFilterNet) was accepted by the public
API, documented as "integrated before VAD and STT", implemented and
unit-tested — but never invoked by any pipeline. Slice 1 of the
pipeline-stages decomposition (docs/architecture/pipeline-stages.md) extracts
the inbound half of on_audio_received / handleAudio into an
InputProcessingChain that owns decode (mulaw->PCM16) -> stateful 8k->16k
resample -> AEC near-end -> audio_filter (NEW) -> VAD feed and returns the
processed frame + VAD event. The handlers keep the downstream logic
(VAD-event handling, self-hearing gate, ring buffer, beforeSendToStt hook,
STT feed) so the diff stays reviewable; with no AEC/filter/VAD configured the
byte path is identical to before.

- Filter wrapper is fail-open: raise / non-bytes return -> passthrough of the
  pre-filter PCM, WARN once then DEBUG, keeps attempting.
- AEC/filter/VAD resolved via late-bound getters (start() and test fixtures
  install _aec/_auto_vad after construction).
- TS chain also owns the per-call VAD error kill switch (former vadDisabled)
  including the 25 ms ONNX inference timeout race.
- (Python) KrispVivaFilter now re-frames input internally to its configured
  frame_duration_ms (remainder buffered, dropped on sample-rate change)
  instead of raising on the pipeline's 20 ms frames vs its 10 ms default.
- Tests: chain-level order assertion (AEC -> filter -> VAD via recording
  fakes), warn-once passthrough, mulaw + PCM parity vs stateful reference,
  handler-level proof that agent(audio_filter=...) transforms the bytes
  reaching a fake STT; Krisp re-framing unit tests. Both full unit suites
  pass unchanged (py 1435 passed/19 skipped; ts 1866 passed/8 skipped, tsc
  clean).

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* test(server): make events.jsonl tool-event assertions order-insensitive

Both logEvent calls in the transcript wrapper are fire-and-forget, so
the on-disk append order of tool_call vs tool_result is not guaranteed
(records carry their own ts). Assert content per type instead of order.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* docs(changelog): document the review-wave fixes under Unreleased

Entries for the fix batches landed on this branch (greeting background
task, per-call STT clone, dashboard SPA window/status/tombstones,
persist default parity, call(first_message), TS onCallStart overrides,
ConvAI client tools, Gemini Live transcriptions, Ultravox event
semantics, Python 1h watchdog, AEC far-end staleness, pipeline speech
events, events.jsonl, plus the provider/transport wave) per the
AGENTS.md same-PR changelog rule.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* feat(pipeline): pause-and-resume false-interruption handling (Python)

Opt-in agent.barge_in_mode="pause_resume" (default "cancel" keeps
today's behaviour byte-identical). LiveKit-style state machine on VAD
speech_start while the agent speaks:

- PAUSE: gate the sentence/audio send loops on _output_paused and
  send_clear the carrier so queued audio stops within a frame. The LLM
  stream and TTS provider stream stay alive: sentences buffer as text
  (capped at 32) and synthesized audio queues into per-sentence
  retention entries (capped at ~15 s of playout; overflow while paused
  degrades to a full cancel, overflow while speaking releases retention
  for the turn). Mic audio flows to STT while paused (output is silent,
  so the line is echo-quiet) and the inbound ring is flushed so the
  confirm window can actually hear the user.
- KILL: a committed final transcript (non-echo, non-hallucination,
  non-duplicate — the existing _handle_barge_in/_commit_transcript
  filter family) within barge_in_confirm_ms (default 1500 ms) runs the
  existing _do_cancel_for_barge_in path and discards the paused
  buffers. The overlap window anchored at pause time is preserved so
  InterruptionMetrics.detection_delay measures VAD-T1 -> confirm-T2.
- RESUME: window expires with no confirming transcript -> re-send the
  cleared-but-unheard tail from retained audio at SENTENCE granularity
  (first sentence not fully played, derived from the #164
  _playback_buffered_until cursor + heard-prefix segments; the
  partially-played sentence replays from its start) without re-billing
  TTS, then release the buffered sentences through the normal synth
  path. Recorded as a false interruption via
  record_overlap_end(was_interruption=False) — the backchannel
  counter, never an interruption — plus a false_interruption event.

The playback bookkeeping is frozen at the heard offset on pause so a
kill still rewrites history to the heard prefix; on resume the replay
re-stamps segments so later barge-ins stay accurate. Turn bodies wait
out an in-flight pause decision before ending (bounded by the confirm
window) so buffered sentences are never orphaned.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* feat(pipeline): pause-and-resume false-interruption handling (TypeScript)

TypeScript port of 3877814 — exact parity with the Python semantics,
defaults, and events (camelCase ↔ snake_case naming):

Opt-in agent.bargeInMode: 'pause_resume' (default 'cancel' keeps today's
behaviour byte-identical). LiveKit-style state machine on VAD
speech_start while the agent speaks:

- PAUSE: gate the sentence/audio send loops on outputPaused and
  sendClear the carrier so queued audio stops within a frame. The LLM
  stream and TTS provider stream stay alive: sentences buffer as text
  (capped at 32) and synthesized audio queues into per-sentence
  retention entries (capped at ~15 s of playout; overflow while paused
  degrades to a full cancel, overflow while speaking releases retention
  for the turn). Mic audio flows to STT while paused and the inbound
  ring is flushed so the confirm window can actually hear the user.
- KILL: a committed final transcript (non-echo, non-hallucination,
  non-duplicate — the existing handleBargeIn/commitTranscript filter
  family) within bargeInConfirmMs (default 1500 ms) runs the existing
  runBargeInCancel path and discards the paused buffers. The overlap
  window anchored at pause time is preserved so detection_delay
  measures VAD-T1 -> confirm-T2.
- RESUME: window expires with no confirming transcript -> re-send the
  cleared-but-unheard tail from retained audio at SENTENCE granularity
  (first sentence not fully played, derived from the #164
  playbackBufferedUntil cursor + heard-prefix segments; the
  partially-played sentence replays from its start) without re-billing
  TTS, then release the buffered sentences through the normal synth
  path. Recorded as a false interruption via recordOverlapEnd(false) —
  the backchannel counter, never an interruption — plus a
  'false_interruption' event ({ resumedSentences }).

The playback bookkeeping is frozen at the heard offset on pause so a
kill still rewrites history to the heard prefix; on resume the replay
re-stamps segments so later barge-ins stay accurate. Turn bodies wait
out an in-flight pause decision before ending — completes the
predecessor's in-progress port by bounding awaitPauseDecision (confirm
window + 5 s fail-open margin, mirroring Python's
_await_pause_decision) so a teardown race can never strand the
dispatch loop.

Tests mirror tests/unit/test_barge_in_pause_resume.py: pause gates
without cancelling, paused buffering + overflow degradation, resume
tail replay + false-interruption metrics/event, kill filters
(final-only / hallucination / duplicate / frozen-prefix history
rewrite), legacy cancel mode untouched, config-off defaults,
streaming-loop integration (resume, kill, stream-ends-paused), and
teardown mid-pause.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* feat(python): preemptive generation — speculative LLM+TTS on confident interim transcripts

Opt-in Agent.preemptive_generation (default False): on an interim transcript
that ends with sentence-final punctuation or is unchanged for
preemptive_min_stable_ms (default 300), pipeline mode starts a speculative
dispatch — built-in LLM loop + sentence-chunked TTS — and HOLDS all audio in
memory (bounded ~15 s; overflow aborts). When the final transcript commits:

- normalized match → RELEASE: buffered audio flushes to the carrier and the
  speculative task becomes the live turn; history/metrics record exactly one
  turn with the final transcript text as the user message, and TTFT/latency
  anchors are stamped from the REAL commit point (user-perceived timing).
- mismatch → discard via the cancel-event machinery (history untouched) and
  dispatch normally on the final.

At most one speculation in flight (a newer qualifying interim replaces it);
VAD speech_start during speculation aborts silently. The consume loop races
the next LLM token against the release signal so a commit mid-token-silence
flushes immediately. New CallMetrics counters preemptive_hits /
preemptive_misses (accumulator record_preemptive_hit/_miss).

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(python): keep the newer speculation when _start_speculation loses the registration race

_start_speculation awaits the old speculation's unwind (bounded 5 s)
before registering its own _SpeculativeTurn. The interim-stability
watcher and the STT receive loop both call it, so a second path could
register a NEWER speculation during that await — and the resuming
caller then overwrote it. The overwritten turn's task was orphaned
parked on its release_event forever: never aborted, never released,
never counted as a miss, holding up to ~15 s of buffered audio and an
open LLM stream until call teardown. It also broke the documented
at-most-one-speculation invariant (two tasks generating concurrently).

Guard the registration: after the abort settles, yield to any
speculation registered concurrently — it always corresponds to the
later-arriving interim. Regression test interleaves a concurrent
registration into the replacement window. Mirrored in the TS port's
startSpeculation.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* feat(typescript): preemptive generation — speculative LLM+TTS on confident interim transcripts

TS port of 238214e at parity: opt-in agent.preemptiveGeneration
(default false) + preemptiveMinStableMs (default 300). On an interim
that ends with sentence-final punctuation, or is unchanged for the
stability window, pipeline mode starts a speculative dispatch —
built-in LLM loop + sentence-chunked TTS — and HOLDS all audio in
memory (~15 s playout cap; overflow aborts). When the final transcript
commits:

- normalized match → RELEASE: buffered audio flushes to the carrier and
  the speculative task becomes the live turn (tracked via dispatchTask);
  history/metrics record exactly one turn with the final transcript text
  as the user message, TTFT/latency anchors stamped from the real commit
  point (user-perceived timing).
- mismatch → discard via the AbortController machinery (history and
  carrier untouched) and dispatch normally on the final.

At most one speculation in flight: a newer qualifying interim replaces
it (noteInterimTranscript is awaited on the transcript drain loop so
replacements serialize — parity with Python's awaited
_note_interim_transcript — and startSpeculation yields to a
concurrently registered newer speculation, mirroring the Python fix).
VAD speech_start during speculation aborts silently; handleStop /
handleWsClose tear down without a miss. The token consume loop races
the next LLM token against the release decision so a commit
mid-token-silence flushes buffered audio immediately. New CallMetrics
counters preemptive_hits / preemptive_misses
(recordPreemptiveHit/Miss), mirroring Python.

One TS-specific addition over the straight port: the released
speculative task clears dispatchTask in its finally exactly like
dispatchTurn's finally does — without this, canSpeculate() (which
requires dispatchTask === null; the TS null-on-done convention, vs
Python's dispatch.done()) stayed false for the rest of the call after
the first hit. Covered by the sequential-two-hits regression test.

Tests mirror Python's test_preemptive_generation.py: immediate start on
punctuated interims, stability-window start, release (single LLM call,
single history turn, hit counted, no audio before commit), mid-stream
release flush, mismatch discard + normal re-dispatch, VAD abort,
replacement, same-interim dedupe, buffer overflow, teardown without a
miss, speculation gates (speaking / dispatch in flight), default-off.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* feat(pipeline): semantic end-of-utterance detection via smart-turn v3 (opt-in, both SDKs)

Integrate the open pipecat-ai smart-turn v3 ONNX end-of-utterance model
as an optional semantic turn detector for pipeline mode in the Python
and TypeScript SDKs.

Design:
- Provider: SmartTurnDetector (getpatter/providers/smart_turn.py,
  src/providers/smart-turn.ts) implements the new TurnDetectorProvider
  interface (threshold / predict(pcm16-16k window) / close). The Whisper
  log-mel preprocessing (reflect-padded 400-pt STFT, 80 Slaney mel
  filters, last-8s left-padded window, zero-mean/unit-variance
  normalize) is ported natively in each SDK; cross-SDK numeric parity is
  locked by a reference-value test generated from the Python
  implementation.
- Wiring: Agent.turn_detector / agent.turnDetector, off by default —
  the speech_end path is unchanged when unset. On a VAD speech_end the
  handler scores the rolling 8 s caller-audio window: probability >=
  threshold finalizes STT immediately (end-of-turn fires early); below
  threshold the finalize is HELD and re-scored every ~200 ms of further
  silence, capped by Agent.max_semantic_hold_ms / maxSemanticHoldMs
  (default 1200 ms, then plain vad_silence). A frame-driven poll plus a
  generation-guarded wall-clock backstop guarantee the cap even if
  inbound audio stalls; a VAD speech_start or an STT-side transcript
  commit cancels the hold.
- Speech events: pipeline mode now fires on_user_speech_eos (only when
  a detector is configured — zero behavior change otherwise) with
  trigger EouTrigger.SEMANTIC_TURN_DETECTOR when the model decided the
  commit vs EouTrigger.VAD_SILENCE otherwise.
- Graceful degradation: onnxruntime/numpy stay optional (the
  getpatter[turn-detector] extra; onnxruntime-node optionalDependency),
  imported lazily. SmartTurnDetector.maybe_load() / maybeLoad() warns
  once and returns None/undefined when the runtime or the model file
  (PATTER_SMART_TURN_MODEL or model_path) is unprovisioned, so the
  agent runs plain VAD-silence endpointing instead of crashing; load()
  keeps fail-fast errors with install/download instructions. At call
  time the handler fails open AND fails once: the first predict error
  logs a single warning and disables the detector for the rest of the
  call (the existing vadDisabled pattern).
- Model weights are NOT bundled (~30 MB); downloaded by the user from
  https://huggingface.co/pipecat-ai/smart-turn-v3.

Also fixes a scratch-buffer aliasing bug in the TS mixed-radix FFT base
case (every n=50 sub-transform corrupted its even half and then
overwrote it with the odd half), caught by the Python-generated
reference-value parity test.

Tests: pytest 2382 passed / exit 0 (ONNX session is the only mocked
boundary, tagged @pytest.mark.mocked); vitest 1896 passed / exit 0
(*.mocked.test.ts twins); tsc --noEmit clean.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* fix(speech-events): single committed-EOS emission at transcript commit in pipeline mode

Integration reconciliation between the unconditional pipeline speech
events (this branch) and the semantic turn detector's EOS stamping (the
smart-turn feature, written against a base where pipeline EOS never
fired):

- Pipeline EOS fires exactly once per committed turn, AT transcript
  commit (the analogue of Realtime's input_audio_buffer.committed) —
  before the hook veto and handler-availability checks — covering the
  on_message path and orphaned turns that the old emission point next to
  record_turn_committed missed.
- The semantic detector's stamped trigger is consumed at that single
  point (semantic_turn_detector | vad_silence | manual_commit); the
  duplicate emission the feature carried is removed in both SDKs.
- TS emitUserSpeechEos gains the vad_silence/manual_commit resolution
  Python already had (it hardcoded vad_silence) and an explicit-trigger
  arg for the Realtime path.
- Released speculative turns (preemptive generation) bypass the dispatch
  path entirely: the release commit now performs the same semantic
  cleanup + EOS emission so combining the two opt-ins neither leaks a
  stale stamped trigger nor skips the event.
- Detector tests updated to the merged contract (EOS always fires; only
  the trigger differs).

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* feat(recording): Python — carrier-neutral local call recording (SDK-side stereo WAV)

serve(local_recording=True) records every call from the media stream the
SDK already proxies — no carrier recording API, no recording fees, audio
never leaves the process. Works on Twilio, Telnyx and Plivo in every
engine mode (pipeline / OpenAI Realtime / ElevenLabs ConvAI); independent
of the carrier-side `recording` flag and off by default.

- New getpatter/audio/call_recorder.py: LocalCallRecorder writes an
  interleaved stereo WAV (left=caller, right=agent — the QA-standard
  layout), 16-bit PCM @ 16 kHz; mulaw 8k / pcm16 8k / pcm16 24k inputs are
  decoded per channel with stateful resamplers. Caller-clocked alignment:
  inbound PSTN frames are the wall clock, agent TTS bursts drain at that
  rate from a bounded FIFO (60 s cap, overflow force-flushed), the idle
  channel is zero-padded.
- Hot-path safe: 64 KiB buffered writes (no per-frame disk I/O), bounded
  memory, any I/O error disables the recorder without touching the call.
- Placeholder RIFF header is patched on close(); every handler cleanup
  path (including abnormal carrier WS drops) finalizes, so truncated
  calls still yield parseable WAVs.
- Wiring: EmbeddedServer.create_local_recorder resolves the target path
  (explicit dir string > call-log dir next to metadata.json/
  transcript.jsonl > ./recordings fallback); the three telephony bridges
  attach the recorder before handler start and surface `recording_path`
  in the on_call_end payload; CallLogger.log_call_end persists it in
  metadata.json. Because the WAV lives in the per-call log directory,
  PATTER_LOG_RETENTION_DAYS sweeps recordings too.
- Tests: WAV header/channel/length round-trips via stdlib wave, both-
  direction capture, caller-clock alignment + silence padding, encoding
  decodes, bounded backlog, buffered-write batching, abnormal-teardown
  finalization, idempotent close, path resolution + sanitization,
  bridge-level recording_path surfacing, retention sweep covering
  recordings, and config-off ⇒ zero filesystem writes.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* feat(recording): TypeScript — carrier-neutral local call recording (parity with Python)

serve({ localRecording: true }) records every call SDK-side as a stereo
WAV (left=caller, right=agent), 16-bit PCM @ 16 kHz — same defaults,
layout, payload keys and metadata shape as the Python SDK.

- New src/audio/call-recorder.ts: LocalCallRecorder with per-channel
  decode (mulaw_8k / pcm16_8k / pcm16_24k / pcm16_16k → PCM16 16 kHz via
  stateful resamplers), caller-clocked alignment with a bounded 60 s
  agent FIFO, 64 KiB batched writeSync (no per-frame disk I/O), and a
  placeholder RIFF header patched on close(). Exported from the package
  index (mirrors Python's importable getpatter.audio.call_recorder).
- StreamHandler taps: caller audio at the top of handleAudio (above
  every engine-mode guard, wire codec from bridge.inputWireFormat);
  agent audio in encodePipelineAudio — the single chokepoint for all
  pipeline sends, decoding the carrier-native μ-law fast path instead of
  skipping — and in onAdapterAudio for Realtime/ConvAI (μ-law wire,
  PCM16 16 kHz for non-negotiated ConvAI).
- fireCallEnd finalizes the WAV on both teardown funnels (handleStop and
  the abnormal handleWsClose) and surfaces `recording_path` in the
  onCallEnd payload; EmbeddedServer.makeLocalRecorder resolves the
  target path (explicit dir > call-log dir > ./recordings fallback) and
  CallLogger.logCallEnd persists recording_path in metadata.json, with
  callDir made public so the WAV lands next to transcript.jsonl and is
  covered by the PATTER_LOG_RETENTION_DAYS sweep.
- Tests: real WAV byte round-trips (header fields, stereo mapping,
  sample rate, lengths), both-direction capture through the live handler
  taps, alignment + silence padding, encodings, bounded backlog,
  buffered-write batching, abnormal-teardown finalization, idempotent
  close, makeLocalRecorder path resolution + sanitization, retention
  sweep covering recordings, and config-off ⇒ zero writes / no
  recording_path key.

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* docs(changelog): local_recording / localRecording — carrier-neutral local call recording

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* feat(python): warm transfer + multi-agent handoff

Two related in-call capabilities, both opt-in with zero behavior change
when unused:

WARM TRANSFER — transfer_call / CallControl.transfer gain carrier-neutral
mode ("cold" default, byte-identical blind redirect | "warm") + summary.
Warm mode on Twilio parks the caller in a per-call conference on hold
music, dials the human agent with the summary spoken first (<Say>), then
bridges the two as the AI leg ends. New signature-validated, fail-closed
webhooks: /webhooks/twilio/conference (lifecycle observability) and
/webhooks/twilio/warm-status (releases a caller stuck on hold when the
human never answers). Telnyx/Plivo return a clear {error} envelope and
keep the AI on the line — never a silent blind-redirect fallback.
Invalid modes are rejected with an error envelope on every path.

MULTI-AGENT HANDOFF — agent(handoffs={name: Agent}) injects a built-in
handoff_to(name, reason?) tool (names enum-constrained). Calling it (or
PipelineStreamHandler._perform_handoff programmatically) swaps the live
call to the target agent's system prompt, tools, variables, guardrails,
text transforms, consult tool, and onward handoffs — history preserved,
a [handoff] system line recorded and never replayed as a fabricated user
turn. Pipeline mode: LLMLoop.update_agent swaps prompt + tool list for
the next turn. Realtime mode: new OpenAIRealtimeAdapter.update_session
sends a partial session.update (GA adapter adds the mandatory
"type": "realtime" discriminator) BEFORE the function result so the next
response already runs as the target. Unknown targets / malformed args
return error envelopes — never silence. Audio infra established at call
start (STT/TTS/engine connection, hence voice on engines that cannot
switch mid-session) is retained; chained handoffs follow the target map.

Tests: tests/test_handoff.py + tests/unit/test_warm_transfer_unit.py —
authentic, mocking only the carrier REST boundary (@pytest.mark.mocked).

https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4

* feat(typescript): warm transfer + multi-agent handoff — full parity with Python

Mirrors the Python SDK feature-for-feature (snake_case <-> camelCase):

WARM TRANSFER — TRANSFER_CALL_TOOL gains mode ("cold" | "warm") + summ…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant