fix(pipeline): arm barge-in for the whole carrier-buffered playback window#164
Merged
Merged
Conversation
…indow Agent-runtime LLMs (HermesLLM/OpenClawLLM) deliver the whole reply at once after a long thinking pause, so TTS outruns realtime and the carrier ends up holding tens of seconds of queued audio. The speaking state ended a fixed 1.5s grace after the last *push*, not the last *playback* — for most of the audible reply isSpeaking was already false, VAD/transcript events were treated as a calm next turn, send_clear never fired, and the agent 'detected the barge-in but kept talking'. Track an estimated playback cursor (_playback_buffered_until / playbackBufferedUntil) advanced per pushed chunk at its real byte rate (PCM16@16kHz = 32 B/ms, carrier-native mulaw@8kHz = 8 B/ms), and split end-speaking-with-grace into two phases: phase 1 holds isSpeaking=true with tailGraceActive=false for the whole estimated backlog (barge-in stays armed and takes the full cancel + send_clear path, dropping the carrier buffer); phase 2 is the unchanged echo-tail grace. Barge-in cancels reset the cursor. No new config; token-paced LLMs (no backlog) behave identically to before, and PATTER_TTS_TAIL_GRACE_MS=0 still forces the legacy synchronous flip. Full Python/TS parity with mirrored unit tests (RED without the fix).
…it-style truncation) Two gaps with agent-runtime LLMs (Hermes/OpenClaw), building on the playback cursor from the previous commit: - Mid-turn barge-in: the whole reply was already synthesized into the carrier buffer, so the '[interrupted by caller]' marker was appended to the FULL text — a stateful runtime believed the caller heard everything. - Post-complete barge-in (during the buffered tail): no marker at all — history kept the full reply the caller never finished hearing. Track per-turn (sentence, playback_start) segments at each sentence's first audible chunk (filler and llm_error_message audio advance the clock but add no segment), map heard = total_pushed - carrier_backlog to a sentence-granular prefix, and: (a) the streaming path records '<heard prefix> [interrupted by caller]'; (b) the barge-in cancel paths rewrite the last assistant history entry the same way before clearing the buffer. Legacy full-text marker preserved when no segments were tracked. Full Python/TS parity with mirrored unit tests.
Collaborator
Author
|
Added a second commit ( 🤖 Generated with Claude Code |
Merged
4 tasks
nicolotognoni
pushed a commit
that referenced
this pull request
Jun 10, 2026
Opt-in agent.barge_in_mode="pause_resume" (default "cancel" keeps today's behaviour byte-identical). LiveKit-style state machine on VAD speech_start while the agent speaks: - PAUSE: gate the sentence/audio send loops on _output_paused and send_clear the carrier so queued audio stops within a frame. The LLM stream and TTS provider stream stay alive: sentences buffer as text (capped at 32) and synthesized audio queues into per-sentence retention entries (capped at ~15 s of playout; overflow while paused degrades to a full cancel, overflow while speaking releases retention for the turn). Mic audio flows to STT while paused (output is silent, so the line is echo-quiet) and the inbound ring is flushed so the confirm window can actually hear the user. - KILL: a committed final transcript (non-echo, non-hallucination, non-duplicate — the existing _handle_barge_in/_commit_transcript filter family) within barge_in_confirm_ms (default 1500 ms) runs the existing _do_cancel_for_barge_in path and discards the paused buffers. The overlap window anchored at pause time is preserved so InterruptionMetrics.detection_delay measures VAD-T1 -> confirm-T2. - RESUME: window expires with no confirming transcript -> re-send the cleared-but-unheard tail from retained audio at SENTENCE granularity (first sentence not fully played, derived from the #164 _playback_buffered_until cursor + heard-prefix segments; the partially-played sentence replays from its start) without re-billing TTS, then release the buffered sentences through the normal synth path. Recorded as a false interruption via record_overlap_end(was_interruption=False) — the backchannel counter, never an interruption — plus a false_interruption event. The playback bookkeeping is frozen at the heard offset on pause so a kill still rewrites history to the heard prefix; on resume the replay re-stamps segments so later barge-ins stay accurate. Turn bodies wait out an in-flight pause decision before ending (bounded by the confirm window) so buffered sentences are never orphaned. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
nicolotognoni
pushed a commit
that referenced
this pull request
Jun 10, 2026
…ipt) TypeScript port of 3877814 — exact parity with the Python semantics, defaults, and events (camelCase ↔ snake_case naming): Opt-in agent.bargeInMode: 'pause_resume' (default 'cancel' keeps today's behaviour byte-identical). LiveKit-style state machine on VAD speech_start while the agent speaks: - PAUSE: gate the sentence/audio send loops on outputPaused and sendClear the carrier so queued audio stops within a frame. The LLM stream and TTS provider stream stay alive: sentences buffer as text (capped at 32) and synthesized audio queues into per-sentence retention entries (capped at ~15 s of playout; overflow while paused degrades to a full cancel, overflow while speaking releases retention for the turn). Mic audio flows to STT while paused and the inbound ring is flushed so the confirm window can actually hear the user. - KILL: a committed final transcript (non-echo, non-hallucination, non-duplicate — the existing handleBargeIn/commitTranscript filter family) within bargeInConfirmMs (default 1500 ms) runs the existing runBargeInCancel path and discards the paused buffers. The overlap window anchored at pause time is preserved so detection_delay measures VAD-T1 -> confirm-T2. - RESUME: window expires with no confirming transcript -> re-send the cleared-but-unheard tail from retained audio at SENTENCE granularity (first sentence not fully played, derived from the #164 playbackBufferedUntil cursor + heard-prefix segments; the partially-played sentence replays from its start) without re-billing TTS, then release the buffered sentences through the normal synth path. Recorded as a false interruption via recordOverlapEnd(false) — the backchannel counter, never an interruption — plus a 'false_interruption' event ({ resumedSentences }). The playback bookkeeping is frozen at the heard offset on pause so a kill still rewrites history to the heard prefix; on resume the replay re-stamps segments so later barge-ins stay accurate. Turn bodies wait out an in-flight pause decision before ending — completes the predecessor's in-progress port by bounding awaitPauseDecision (confirm window + 5 s fail-open margin, mirroring Python's _await_pause_decision) so a teardown race can never strand the dispatch loop. Tests mirror tests/unit/test_barge_in_pause_resume.py: pause gates without cancelling, paused buffering + overflow degradation, resume tail replay + false-interruption metrics/event, kill filters (final-only / hallucination / duplicate / frozen-prefix history rewrite), legacy cancel mode untouched, config-off defaults, streaming-loop integration (resume, kill, stream-ends-paused), and teardown mid-pause. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
FrancescoRosciano
added a commit
that referenced
this pull request
Jun 11, 2026
…mantic EOU + review-wave fixes (both SDKs) (#169) * fix(providers): add missing Union import in 8 provider modules Several STT/TTS/realtime provider option classes reference Union[...] in type annotations but never import it. `from __future__ import annotations` masked the omission at import time, but typing.get_type_hints() and other runtime annotation introspection (Pydantic, docs tooling, inspect with eval_str=True) raised `NameError: name 'Union' is not defined`. Affected: assemblyai_stt, cartesia_stt, soniox_stt, whisper_stt, rime_tts, lmnt_tts, gemini_live, ultravox_realtime. Python-only fix (TS unaffected). https://claude.ai/code/session_01Nrb3ZoVFc6K4v1asd2jN8P * fix(llm): surface HTTP errors from non-OpenAI providers (TS) The TypeScript Anthropic/Google/Groq/Cerebras providers returned silently on a non-2xx LLM response instead of throwing. Two regressions followed: - FallbackLLMProvider treated a generator that completed with zero chunks as success, so it never failed over to the next provider. - The stream handler only speaks `agent.llmErrorMessage` when the LLM loop throws, so a silent return produced dead air on the call. Python (anthropic/google via vendor SDKs, groq/cerebras via the openai SDK) already raises on HTTP errors, and the TS OpenAI provider already throws PatterConnectionError — these four were the outliers. Make them throw PatterConnectionError too, and cap the logged/thrown error body to 200 chars (provider 401 bodies have been observed to embed the rejected API-key prefix). Updates the two Cerebras tests that asserted the old silent-drain behaviour to expect the throw while still verifying the recovery-hint log. https://claude.ai/code/session_01Nrb3ZoVFc6K4v1asd2jN8P * fix(python): resolve undefined names in public type annotations + export gaps - client.py: PipelineHooks/ConsultConfig/CallResult/RealtimeTurnDetection (models) and VADProvider/AudioFilter/BackgroundAudioPlayer (providers.base) were referenced in Patter.agent()'s signature but never imported — IDEs and typing.get_type_hints() raised NameError on the SDK's main entry point. Tool/SpeechEventCallback move from TYPE_CHECKING to runtime imports (no cycle), so get_type_hints(Patter.agent) now fully resolves. - models.py: BargeInStrategy added to the TYPE_CHECKING block (same bug). - google_llm.py: missing Union import (companion to the earlier 8-module fix), drop dead api_key local, unshadow call_id loop variable. - __init__.py: 53 provider option enums were re-exported but missing from __all__ (import * / doc tooling missed them); stt/tts package __all__ gain openai_transcribe, elevenlabs_ws, inworld. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(llm): barge-in/tool-dispatch safety, provider protocol fixes, idle stream timeout Python: - llm_loop: when cancel_event fires mid-stream every provider returns cleanly, leaving truncated tool-call JSON accumulated — the loop then executed those tools with {} arguments after the caller interrupted (transfer/SMS/booking firing with empty payloads). Bail out before tool dispatch on cancel, and answer malformed-JSON tool calls with an error envelope instead of executing with guessed arguments. - stream_handler/test_mode: history was snapshotted AFTER pushing the current user turn while LLMLoop._build_messages appends user_text itself — every request carried the user utterance twice. - cerebras: 404 model_not_found was swallowed (empty stream looks like success → no fallback failover, no spoken llm_error_message, dead air). Now logs the recovery hint and re-raises, mirroring TS; test updated. - anthropic/google: prepend a synthetic user turn when history starts with the first_message greeting (Messages API requires user-first; Gemini same shape), map Gemini functionResponse.name back to the real function name via the paired functionCall (spec requires the names to match), subtract cached tokens from Gemini input usage. - chat_context: to_anthropic folded role:"tool" entries into user turns (Anthropic 400s on tool role); truncate drops leading orphan tool results (bare tool_call_id 400s on OpenAI). - fallback_provider: forward caller/callee to delegates and only pass the context kwargs each delegate's stream() declares — a minimal custom provider no longer TypeErrors on every attempt (availability flapping). TypeScript (mirrors where applicable): - replace the fixed 30 s whole-stream LLM ceiling with an idle watchdog (createStreamIdleWatchdog, re-armed per chunk) in OpenAI/Anthropic/ Google/Groq/Cerebras/OpenAI-compatible providers; idle aborts now throw PatterConnectionError instead of surfacing as a fake barge-in AbortError (parity: Python has no whole-stream ceiling). - anthropic: handle in-band SSE error events (overloaded_error) by throwing instead of ending the stream as success; user-first guard. - google: user-first guard, functionResponse.name mapping, cached-token subtraction. groq/cerebras shared parser: subtract cached tokens from prompt_tokens (was double-billing cache reads). - chat-context: same to_anthropic/truncate fixes as Python. - tests updated to the new contracts + new watchdog unit tests. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(dashboard): ingest crash, phantom live rows, SSE freeze, parity gaps - store.py: record_call_end crashed with TypeError when the standalone ingest passed metrics as a plain dict (asdict on non-dataclass) — and the exception fired AFTER the active row was popped, so every completed call vanished from the standalone dashboard at hangup. Accept dicts. - store.py: update_call_status now copies the live transcript/turns into the terminal entry (TS already did) — the Twilio statusCallback vs WS stop race no longer blanks the transcript pane. - both stores: add Plivo 'timeout'/'cancel' to the terminal status set — rows for unanswered/cancelled Plivo dials leaked in the active set forever (phantom live call). - both servers: Telnyx call.hangup with a no-media cause (busy/no-answer/ rejected) now terminal-izes the pre-registered dashboard row — same permanent active-set leak. - store.py SSE: a force-dropped slow subscriber now receives a close sentinel so its generator ends and EventSource reconnects — previously the dashboard froze forever while showing 'streaming · sse'. - cli ingest (both SDKs): a finished-call payload is no longer replayed as a fresh call_start (spurious SSE event + started_at = ingest time); stores derive started_at from the metrics duration when absent. - cli.ts: raise express.json body cap to 5 MB (long-call ingests 413'd and silently vanished). - api_routes.py: /api/v1/calls/{id} falls back to the active set (TS parity). routes.py: clamp negative ?limit; interpret date-only export filters as UTC like JS Date (same query returned different ranges per SDK). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(telephony/server): Telnyx outbound media, raw Ed25519 keys, Plivo wait, WS hardening Cross-carrier correctness fixes confirmed by the deep review (several found independently by two reviewers): - Telnyx outbound calls never got a media stream in EITHER SDK: a perf refactor folded streaming_start into actions/answer on call.initiated, but Answer is only valid on incoming legs and call.answered had become a no-op — callees answered to dead air. Outgoing legs now skip the answer and attach the stream via actions/streaming_start on call.answered. - Telnyx webhook signature validation only accepted DER/SPKI public keys, but the Telnyx portal issues TELNYX_PUBLIC_KEY as base64 of the RAW 32-byte Ed25519 key — every webhook 403'd (fail-closed) the moment the documented security feature was enabled. Both forms now verify; tests cover the raw form. - Plivo call(wait=True) could never resolve: completions/AMD/prewarm were keyed by the dial-time request_uuid while every webhook carries the live CallUUID. The answer webhook now re-keys all per-call bookkeeping (alias_call_id / aliasCallId + client prewarm re-key); the TS Plivo branch also actually routes through maybeAwaitCompletion (wait was silently ignored). - TS carrier WS: no 'error' listener (an ECONNRESET became an uncaughtException killing every live call), unguarded async 'close' listeners (throwing onCallEnd → unhandled rejection → crash), and ws@8 invoking async listeners unawaited (interleaved handleAudio → VAD state races, out-of-order STT). All three carrier streams now serialize events onto a per-connection FIFO with contained errors + error listeners. - Per-IP WS cap counted the tunnel's loopback peer: hard ceiling of 10 concurrent calls behind cloudflared/ngrok and a trivial shared-bucket DoS. Loopback peers now key on CF-Connecting-IP / X-Forwarded-For. - Voicemail drops (Telnyx/Plivo, both SDKs) were awaited inline in webhook handlers including a playback sleep of up to 30 s — carriers timed out and retried, double-speaking the message. Now tracked fire-and-forget tasks; the Telnyx drop also moves from the early call.machine.detection.ended to call.machine.greeting.ended (the beep), so the message is no longer clipped mid-greeting; playback estimate constants aligned (were 2x apart between SDKs). - machine_end_other now triggers voicemail-drop/prewarm-evict like the other machine_end_* outcomes (both SDKs). - Telnyx configure_number PATCHed connection_id to /phone_numbers/{id}/voice which silently ignores it (auto-config 'succeeded' but inbound never routed) — association now goes to PATCH /phone_numbers/{id} (all 3 impls). - Python: completion futures resolve in finally around user on_call_end (a throwing callback stranded wait=True until the 30-min backstop; TS same); serve() no longer crashes on Windows (add_signal_handler); call(from_number=...) was always ignored (config value won the or); webhook_url now normalised to a bare hostname (schemed values built wss://https://... URLs); outbound Telnyx/Plivo dials leaked a pooled httpx client per call; bridges resolve direction from the store instead of hardcoding 'inbound'; handler.cleanup() guarded in all three bridge finallys; WebSocketDisconnect no longer logged/recorded as a call error; Plivo bridge masks phone numbers in logs and sends the same transcript/conversation_history payload shape as Twilio/Telnyx. - TS: recording: true now actually starts Plivo recording (worked in Python only). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(telephony): telnyx on_call_end reads conversation_history from the handler The bridge keeps no history deque of its own (Twilio/Plivo do) — the parity addition referenced an undefined name, which the on_call_end try/except silently swallowed, skipping the callback entirely. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(pipeline): core orchestrator correctness — both SDKs Python stream handler: - transcript/history deques: 'x or deque()' silently replaced the carrier-shared (empty → falsy) deques with private ones, so EVERY on_call_end payload carried an empty transcript and history. Use 'is not None'. - STT-connect-failure hangup called _hangup_fn(call_id) but the carrier hangup closures take no args — the TypeError was swallowed and the call stayed up, deaf. - apply_call_overrides round-tripped the Agent through dataclasses.asdict, dict-ifying nested configs and deep-copying live provider objects — any per-call override from on_call_start crashed the call later. Use dataclasses.replace. - _await_dispatch_settle: dispatch-turn failures were logged at DEBUG (callers heard silence, operators saw nothing) and CancelledError was swallowed even when the awaiting task itself was being cancelled, defeating teardown. cleanup() now also cancels the STT loop BEFORE the dispatch task so a racing transcript can't respawn an orphan turn, and guards each adapter close individually. - user on_transcript/on_metrics callbacks are now exception-contained (_safe_on_* helpers): one raise inside the realtime forward loop permanently killed event forwarding (zombie call). - mcp_servers were silently ignored in pipeline mode (only the realtime handler called _init_mcp_tools); pipeline start() now discovers MCP tools and cleanup() closes the sessions — matching the documented mode-agnostic contract and TS. - realtime function_call: unknown/handler-less tools and malformed argument JSON now get an error-envelope function_result instead of silence (a dangling call item stalled the model: dead air). Mirrors TS. - pipeline transfer_call validates E.164 BEFORE invoking the carrier transfer (which silently no-ops on bad targets) and returns the same rejection envelope as the realtime path. - realtime guardrails: evaluate on accumulated text (per-delta checks never matched terms split across deltas), clear the carrier playout buffer on block, and speak the replacement via the no-fake-turn reassurance path — send_text injected it as a phantom role:user turn the model then replied to. - barge-in: echo guard now runs BEFORE the tail-grace rescue (the grace window is exactly when the agent's final-sentence echo arrives — the rescue disarmed the downstream echo check and the agent answered its own words); duplicate/hallucination finals are filtered BEFORE cancelling (Deepgram's is_final twin of a just-committed speech_final cancelled the agent's brand-new turn); a strategy-confirmed barge-in now actually flushes the inbound ring, and the pending window forwards audio to STT — with strategies configured but forward-stt off, no transcript could ever arrive, so strategy barge-in was structurally impossible. - firstMessage: history append no longer gated on metrics being enabled (model could re-greet), echo-guard reference now covers the greeting and non-streaming replies, prewarm pacing derives bytes/ms from the active output format (mulaw 8k prewarm bytes were paced 4x too fast, re-opening the barge-in flush window). - STT send_audio failures degrade to dropped frames (rate-limited warn) instead of tearing the whole call down via the carrier read loop. - remote_message: the 30 s asyncio.timeout spanned the generator's whole consumption INCLUDING TTS playback time of each yielded chunk — long spoken replies were cancelled mid-sentence with no log. Now a per-receive idle timeout. - services: IVR loop detector compares the newest chunk to its immediate predecessor (max-over-window false-fired on alternating A/B prompts); scheduler cache stores (loop, scheduler) so a reallocated id() can't hand back a scheduler bound to a dead loop; markdown filter no longer eats all prose after a bare '<'. TypeScript stream handler (mirrors + TS-specific): - dispatchTask gets its rejection handler AT creation (dispatchTurn is try/finally only; the next turn's catch attached far too late for Node's unhandled-rejection check → process crash). fireCallEnd guards the user onCallEnd; processTranscript guards the user onTranscript. - same echo-guard-before-tail-grace reorder; same firstMessage / runRegularLlm / WS-remote echo references; runRegularLlm returns its final text instead of the caller re-reading history[-1] (raced a concurrently committed user turn). - WS-remote turns now honour barge-in at the outer loop (previously kept consuming the remote stream and started a fresh TTS synthesis per chunk after the interrupt) and only bill TTS/turn-complete on a clean finish. remote-message drains buffered frames after done/close (the old !done condition dropped every buffered chunk after the first). - LLM tool loop: bail out before/between tool executions when the abort signal fired (parity with the Python cancel_event fix — no more side effects from truncated tool JSON after a barge-in). - speechEvents threaded into StreamHandlerDeps (the public onUserSpeechStarted/.../onAudioOut API never fired on real served calls — only unit tests passed it). - scheduler.scheduleOnce chains timeouts past Node's 2^31-1 ms clamp (a >24.8-day job fired immediately); IVR note*State respect stop(); test-mode REPL survives provider errors. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(stt/audio): callback containment, sequential reuse, resampler boundary correctness STT (TypeScript): every provider emit loop now contains BOTH sync throws and async rejections from transcript callbacks — the registered callback is async, so the bare cb(t) (or sync-only try/catch in Deepgram/ Speechmatics) left its rejection unhandled and killed the Node process on any user-callback error. STT (Python): - whisper/telnyx/speechmatics adapters fully reset per-call state on connect(): close() left a closed httpx client, an unsent WAV-header flag and stale None/_STOP sentinels behind, so a sequential second call on the same adapter instance was deterministically broken (no transcripts / instant loop exit / rejected audio). - whisper transcriptions are now chained sequentially (both SDKs had the ordering bug; Python fixed here): parallel HTTP requests with OpenAI's latency variance routinely delivered chunk N+1's final before chunk N, scrambling word order in history. - AssemblyAI reconnect: _running stayed true through the reconnect handshake — the consumer polls it every 100 ms and the TLS+WS handshake takes longer, so a successfully reconnected (billed) session delivered zero transcripts. - Soniox (both SDKs): add finalize() ({type:'finalize'}) so the VAD speech_end fast-path actually works (every turn previously waited out the full endpointing delay), and stop re-emitting identical interims on token-less keepalive frames. - OpenAITranscribeSTT (both SDKs): reject verbose_json up front — the gpt-4o transcribe models 400 on it, so every chunk failed (logged only) while audio kept being buffered and billed. - deepgram: Transcript.words back to a tuple (frozen-dataclass contract); providers.deepgram() helper smart_format default aligned with the class (the two entry points behaved differently); providers.soniox(language=) now maps to language_hints instead of being silently discarded. Audio: - StatefulResampler.flush() (py) fed the partial-frame carry to ratecv, which ALWAYS raises on a non-whole frame — every odd-length stream crashed the flush path. Drop the sub-frame remainder like TS. - TS 16k→8k FIR decimator rewritten with a real lookahead carry: the old single-pending-sample design processed the carried sample twice (lost the true s-2) and edge-replicated the +2 tap at every chunk end — audible crackle at chunk boundaries on the main Twilio outbound path. Chunked output is now bit-identical to one-shot output (regression tests added). - AEC far-end taps (3 py + 2 ts sites) gated on the carrier-native fast path: with the TTS adapter auto-flipped to ulaw_8000 they pushed mulaw wire bytes into an int16-PCM-16k echo canceller — garbage reference, and odd-length chunks crashed np.frombuffer mid-turn (misreported as an LLM error). - Silero VAD (py): queue transitions beyond the first per process_frame instead of dropping them (a chunk spanning speech_end→speech_start lost the start event); reset() clears the queue. - background_audio builtin_clip_path returned a path whose as_file context had already exited — on zip-based installs the extracted temp file was deleted before use. Keep the context open for the process lifetime (same pattern as silero_onnx). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(tools): timeout defaults, breaker probe gating, schema generation, Gemini sanitization - tool executor (py): handler tools (which is what every MCP tool is) ran UNBOUNDED when no per-tool timeout was declared — a hung tool froze the realtime event loop indefinitely. Apply the documented 10 s default (mirrors TS); webhook timeouts are now terminal like handler timeouts (retrying multiplied a dead webhook's wait to many minutes per turn); non-JSON-serializable returns no longer burn the whole retry+backoff loop (default=str). - consult (both SDKs): the consult tool now declares its own timeout budget — TS DefaultToolExecutor raced the handler against the 10 s default and killed any consult longer than that, while the handler's own budget was 30 s; the Python tool needed the declaration for the new executor default. - circuit breaker (py): HALF_OPEN admitted unlimited concurrent probes (comment said one) — a burst of parallel tool calls hammered a recovering backend. Gate with probe_in_flight like TS. - @tool schema generation (py): PEP 604 unions (str | None) have origin types.UnionType, not typing.Union — the idiomatic 3.10+ spelling mapped to {type: object} and was wrongly marked required. Literal[...] now emits enum, list[X] emits items (Gemini rejects array schemas without items). - define_tool (py) returned a plain dict that Patter.agent(tools=[...]) rejects with TypeError since 0.5.0 — now returns the public Tool. - Gemini schema sanitization (google_llm.py, gemini_live.py, google-llm.ts, gemini-live.ts): recursively strip JSON-Schema keys the proto Schema rejects ($schema, additionalProperties, oneOf, …) — strict-mode tools REQUIRE additionalProperties:false and nearly every zod-derived MCP server emits $schema, so one such tool 400'd every Gemini turn/session. - MCP (ts): transport throws from callTool now return the structured error envelope instead of reaching the executor's retry loop, which re-fired non-idempotent MCP tools up to 3x on transient errors (parity with Python). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(engines): GA listener leak, ConvAI wiring, truncate targets - openai-realtime-2.ts: the connect() monkeypatch registered every message listener as an anonymous wrapper, so removeListener (identity match) never removed the setup listener — it stayed attached for the whole call and its error branch ran ws.close() on the FIRST benign mid-call error frame (commit-empty, truncate-too-short, parallel tool-result conversation_already_has_active_response), tearing down the live engine socket. The patched on() now keeps a handler→wrapped map and off() translates through it; the setup listener also hard-ignores messages once settled. - ElevenLabs ConvAI (TS): the adapter hardcoded language 'it' (every conversation forced to Italian or failing initiation) and always sent a voice_id override (ElevenLabs rejects overrides not enabled in the agent's security settings — broke default-configured agents). Overrides are now sent only when explicitly configured. buildAIAdapter switches to the options form with ulaw_8000 in/out — the positional form sent no output_format, so ConvAI streamed PCM16@16k onto the mulaw carrier wire (loud static on every TS ConvAI call). - ElevenLabs ConvAI (py): the Telnyx bridge built the handler without for_twilio=True even though Telnyx negotiates PCMU 8 kHz — caller mulaw bytes were fed to ConvAI as PCM16@16k (garbled in both directions; Twilio/Plivo branches were already correct). - OpenAI Realtime (both SDKs, v1+GA): response.output_item.added also fires for function_call items — recording those as the truncate target made barge-in during a tool turn truncate a non-message item, which the server rejects with an error event. Only message items are tracked now. - OpenAI Realtime GA (both SDKs): the GA session schema removed 'temperature' — forwarding it made the session.update fail and the call drop at pickup. Warn-and-skip. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(observability): GA cost attribution, metric correctness, telemetry opt-out hygiene - cost: openai_realtime_2 calls fell through to the pipeline branch of _compute_cost in BOTH SDKs (exact-string match on 'openai_realtime') and reported $0 AI cost for the most expensive engine — while still emitting cached savings. TS also never captured realtimeModelName / the realtime provider tag for GA calls. - agent_response_ms (py): llm_ttft_ms is initialised to 0.0 so the 'is not None' gate was always true — when no first-token signal fired the flagship SLO metric silently EXCLUDED the whole LLM segment. Now gated on the actual signal (TS already leaves it undefined). - EOU delay (py): the final-transcript path stamped record_vad_stop unconditionally, overwriting the real VAD speech_end stamp microseconds before stt_final — end_of_utterance_delay was always ≈0 and the fake endpoint signal defeated record_stt_complete's own don't-fake logic. The fallback stamp is now first-wins. - InterruptionMetrics units: Python emitted SECONDS, TS milliseconds — cross-SDK consumers were 1000x apart for the same event. Python now emits ms (every other latency field is *_ms); TS gains Python's early-return so stray overlap-ends no longer inflate interruption counts. Docstring + regression test updated. - call-log (ts): logTurn/logEvent/logCallEnd re-derived the day directory from Date.now(), so calls crossing midnight UTC split across two day dirs — the original metadata stayed 'in_progress' forever and the dashboard hydrate resurrected phantom live calls. Per-call startedAt map added (mirrors Python). - telemetry opt-out (both SDKs): the environment-dims helper ran unconditionally at construction and its previousVersion probe WROTE ~/.getpatter/version — violating the documented 'opting out never touches the filesystem' invariant. Now gated on enabled. Numeric dimensions (latency_ms/cost_usd/…) now require numbers — the one gap that let free text reach the wire. - pricing (both SDKs): gpt-4.1 $3/$12 → $2/$8 and gpt-4.1-mini $0.80/$3.20 → $0.40/$1.60 (published OpenAI rates; siblings were correct); gpt-4o-realtime-preview audio still carried the Oct-2024 launch price ($100/$200) — cut to $40/$80 in Dec-2024. - evals (py): one transient judge failure (429/timeout/missing key) aborted the whole suite and discarded every completed case — now recorded as a failed case; the verdict is computed locally from the score instead of trusting the judge's self-reported 'passed' (a hallucinated passed:true with score 0.2 recorded a pass). - observability exports (ts): shutdownTracing/withSpan/recordPatterAttrs/ patterCallScope/attachSpanExporter were not exported from the package root — users could not flush the BatchSpanProcessor Patter creates (NodeTracerProvider does not flush on exit), silently dropping trailing spans. - minor parity: Python _opt_avg now filters zeros like TS optAvg; TS recordTtsFirstByte emits only inside the first-byte latch (re-emitted stale TTFB events); stale 'masked by default' phone-redaction docstrings corrected in both SDKs. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(tts): Telnyx wire format, Cartesia 8k double-decimation, chunker/facade parity - ElevenLabs 'native format' for Telnyx was declared pcm_16000 across the TTS layer (both SDKs: native-format maps, for_telnyx factories, and the stream handlers' native checks) while the SDK's own streaming_start pins the Telnyx wire to PCMU/μ-law @ 8 kHz — the native fast path therefore shipped raw PCM16 bytes onto a μ-law wire: pure static on every default ElevenLabs-on-Telnyx call. Every surface now agrees on ulaw_8000 (the TS handler check also gates on known carriers); tests updated to the corrected contract. - CartesiaTTS.for_twilio/forTwilio (and the pipeline facades, both SDKs) requested sample_rate=8000, but the audio sender has no consuming hook for a declared TTS rate — it unconditionally runs its fixed 16k→8k decimator, so the 8 kHz audio was decimated AGAIN and played at ~2x speed (chipmunk) on every call using the documented factory. The factories now emit 16 kHz (the pipeline rate). - sentence chunker (both SDKs): a standard-path emission now ends the aggressive 'first flush' window — only the aggressive flush cleared the flag, so a comma in sentence 2+ could still trigger a clause-level flush mid-turn (choppy prosody, contradicting the documented contract). - tts/openai.py facade default aligned to gpt-4o-mini-tts (the underlying provider default and the TS facade) — the same nominal config produced different voice/latency/price per SDK. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * test(tts): update remaining Cartesia pipeline-facade tests to the 16 kHz contract https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(parity): restore the cross-SDK parity suite; fail fast on missing OpenAI key in TS agent() The parity runner still referenced the pre-monorepo layout (sdk/, sdk-ts/, package 'patter') — every one of the 10 scenarios failed, so the suite had silently stopped guarding parity. Restored: - paths/imports updated to libraries/python + libraries/typescript/dist and the getpatter package (tool_executor moved to tools/). - call_init / voice_mode_enum rewritten for the modern carrier-object API (cloud mode and DEFAULT_*_URL report 'removed' on both sides). - the TS shim silences the telemetry banner and the runner parses the last stdout line, so SDK construction output can't corrupt the JSON protocol. - sentence_chunker delegates to its dedicated standalone runner (xfail semantics) instead of counting as a failure. The revived suite immediately caught a real divergence: Python fails fast in agent() when OpenAI Realtime mode has no API key, TS deferred to call time (dead call instead of a clear construction error). TS now validates eagerly too; test helpers gain stub keys. Suite result: 10/10 scenarios matched (sentence_chunker: 53 pass, 8 xfail). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(dashboard-app): live time window, ringing status, deletion tombstones — rebuilt ui.html - App.tsx: the bucket strategy captured Date.now() once and was memoized on [range] only, so the time window froze at mount — a dashboard left open past the window edge silently dropped every call that ended after it (within at most 1 hour on the default 24h view). The window now re-anchors on a 30 s tick. - mappers: Twilio 'ringing'/'queued' statuses mapped through the default branch to 'ended' — every outbound call showed an "ended" pill for the whole 10-30 s ring phase. They now map to the (already styled, previously unused) 'queued' pill and count as ongoing. - mergeCallPreserving resurrected soft-deleted calls forever: deletions are absent from the server snapshot by design, and the prev-carry-over loop re-appended them on every refresh (cross-tab deletes never propagated). The calls_deleted SSE payload and local deletes now feed a tombstone set consulted by the merge. - turnCount transcript fallback halves the line count (one line per user AND assistant message double-counted turns past the percentile gate); 'All' range sparkline now actually derives its window from data extents (the {fromMs: 0} sentinel was truthy, so every call landed in the rightmost 1970→now bucket). dist rebuilt and synced into both SDKs' dashboard/ui.html (identical md5). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(pipeline): play the greeting as a background task — both SDKs The firstMessage streamed INLINE: in Python inside PipelineStreamHandler .start(), which the carrier bridge awaits from its single WS read loop — so for the whole greeting no media frames were processed (VAD/barge-in structurally impossible on the first message), stop frames went unnoticed, and prewarmed mark-gated pacing starved because mark acks could never be read (0.5 s timeout per chunk → ~13x slower than realtime, guaranteed jitter underrun). The TS handler had the same shape inside handleCallStart, made symmetric by the recent per-connection FIFO serialization. Both handlers now await beginSpeaking(is_first_message=true) BEFORE returning (the self-hearing guard engages from the very first inbound frame) and stream the greeting in a tracked background task (_play_first_message / playFirstMessage). Teardown cancels (py) / settles (ts) the task before adapters close; failures log instead of killing call setup. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(stt): clone the configured adapter per call — both SDKs STT adapters are stateful per-connection objects, but the documented usage hands ONE instance to ONE agent served for MANY calls: concurrent calls shared a socket/queue/callback set, so call B's connect() overwrote call A's WebSocket, each call could receive the OTHER caller's transcripts, and the first hangup closed the surviving call's socket. Python: STTProvider.__init_subclass__ now captures every subclass's ORIGINAL constructor arguments (outermost call wins through inheritance chains, zero per-provider code — user subclasses included) and a generic clone() replays them; _create_stt_from_config clones provider instances per call, degrading to the legacy shared instance with a loud warning when clone() fails. Verified across all 7 streaming providers. TypeScript: each provider records its construction args and exposes clone(); createSTT() clones per call with the same loud-warning fallback for adapters without clone(). The identity test updated to the new contract (same type/config, fresh connection state, distinct per call). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(api): persist default parity, per-call first_message, TS onCallStart overrides - persist (TS): both SDKs' docs state persistence is ON by default since 0.6.2 (the dashboard hydrate path needs on-disk records across restarts), but resolvePersistRoot had regressed to opt-in — with persist omitted and no PATTER_LOG_DIR, TS silently wrote nothing while Python persisted. Aligned to the documented default. - call(first_message=...) (py) was documented but never referenced in the body — now applied as a per-call frozen-dataclass copy of the agent so prewarm synthesis, the bridge and the handler all see the override. TS gains the same option (LocalCallOptions.firstMessage) for parity. - onCallStart per-call overrides (TS): Python has applied a dict returned from on_call_start as per-call agent config since 0.5.x; TS typed the callback void and ignored the result. The handler now applies returned overrides (snake_case keys mirroring apply_call_overrides; the Python-only stt_config/tts_config keys warn-and-skip), the server's logging wrapper forwards the return value instead of swallowing it, and the public callback types accept the override shape. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(audio/core): AEC far-end staleness + adaptation floor; 1h max-call watchdog in Python - NLMS AEC (both SDKs): the far-end ring only advances on push, so while the agent was silent processNearEnd convolved the SAME frozen TTS tail into every 20 ms user frame forever — a repeating buzz at echo-estimate amplitude superimposed on user speech, exactly when there is no echo to cancel. The canceller now passes through when the reference is stale (>250 ms since the last far-end push). The adaptation floor also rises from 1e-6 (-120 dBFS — a TTS fade-out still 'counted' as far energy, letting weights blow up against user speech with a near-zero norm) to 1e-3 (≈ -60 dBFS), freezing adaptation on an effectively-silent reference. - Python stream handlers gain the 1-hour auto-hangup watchdog TS has had all along (armed in each start(), cancelled in cleanup()) — a call whose carrier stop never arrives could previously run and bill forever. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(engines): ConvAI client tools, Gemini Live transcriptions, Ultravox event semantics - ElevenLabs ConvAI (both SDKs): handle the previously-ignored client_tool_call event — configured client tools stalled until the provider-side timeout and reported failure. The adapters surface it as the shared function_call event and gain a client_tool_result sender; the handlers route through the tool executor and ALWAYS answer (unknown tools and execution errors included — silence stalls the ElevenLabs agent). transfer_call/end_call declared as ElevenLabs client tools now reach the carrier helpers. - Gemini Live (both SDKs): enable input/output audio transcription in the session config and parse serverContent.inputTranscription/ outputTranscription — native-audio sessions previously produced NO user transcript ever and no assistant transcript in AUDIO modality (logs/history/metrics empty for every Gemini Live call). goAway is now logged loudly (the only warning before the server drops the ~10-15 min session). - Ultravox (both SDKs): 'listening' is entered after EVERY normal agent turn — mapping it to speech_started cleared the carrier playout buffer at each turn end, clipping the audio tail; turn end is now the speaking→listening transition ('idle' never fires mid-call, so response_done effectively never fired before). Agent transcripts emit delta frames only — full-text frames forwarded as appends duplicated the transcript ('Hel'+'lo'+'Hello'). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(observability): pipeline speech events end-to-end + events.jsonl operational log - Thread the SpeechEvents dispatcher through all three Python carrier bridges into every handler; ConvAI and Pipeline handler ctors now accept/forward speech_events like the Realtime handler already did. - Emit the documented user/agent speech events from pipeline mode in both SDKs (VAD start/stop, EOS on turn commit, agent begin/end with interrupted=true on barge-in) — previously realtime-only. - Write events.jsonl (documented since 0.6 but never written): tool_call/tool_result records from role=tool transcript lines, barge_in from interrupted turns, error from CallMetrics.error_code (also persisted as metadata.json "error") — wired in both servers' logging-callback wrappers, with unit tests reading the files back. - Update stale test contracts surfaced by the full-suite run: remote WS mock gained recv() (per-receive idle-timeout loop), transfer_call now rejects non-E.164 before invoking the carrier helper, guardrail replacement speaks via send_reassurance instead of send_text. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * feat(evals): EvalSession — eval harness that drives the REAL pipeline call loop The existing eval runner scored an arbitrary reply(text) -> str callable and never exercised the SDK. This adds getpatter.evals.session.EvalSession, which constructs an actual PipelineStreamHandler and injects user turns through the same path a live call uses — the real STT receive loop (_handle_barge_in -> _commit_transcript -> _dispatch_turn), the real LLMLoop with the real ToolExecutor, pipeline hooks, guardrail replacement, dedup/hallucination filtering, history handling, and metrics — with only the paid/external boundary faked: * FakeAudioSender (records send_audio/send_clear/send_mark, auto-acks marks) * FakeSTT (queue-backed; finals flow through the handler's real _stt_loop) * FakeTTS (records spoken sentences, yields 10 ms of silence) * ScriptedLLMProvider (deterministic chunk scripts for CI) or any real LLMProvider for live evals API: `async with EvalSession(agent=..., llm_provider=...) as s:` then `result = await s.user_says("...")` -> frozen TurnResult(agent_text = what the caller heard post-guardrails/hooks, tool_calls, history_snapshot, interrupted, metrics_turn). getpatter.evals.assertions.expect(result) adds chainable tool_called(name, args_subset=) / no_tool_called() / agent_text_contains(...) and an async judge(llm_judge, intent=...) that reuses the existing LLMJudge. EvalCase gains optional agent= / llm_provider= fields; EvalRunner routes those through EvalSession while the legacy reply()-factory path (and the `patter eval` CLI, which keeps that contract) is unchanged — both flavours mix in one suite. stream_handler.py is untouched: the session drives existing public/internal methods only. Tests (no network, scripted provider) prove: (a) a tool-call case asserts via tool_called with the REAL ToolExecutor running a local handler, (b) multi-turn history accumulates with prior turns exactly once (pinning the existing cross-SDK trailing duplicate of the current user message), (c) guardrail replacement is observable in agent_text, (d) cleanup leaves no pending tasks (asyncio.all_tasks comparison), plus commit-filter drops surfacing loudly, hooks, metrics capture, and runner integration. Python-only by design (the TypeScript CLI already prints an evals stub). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * docs: pipeline-stages decomposition design (InputChain / TurnManager / OutputChain) Proposes splitting PipelineStreamHandler's three interleaved state machines into composable stages, inventories today's states/transitions and maps every PipelineHooks surface and recently-fixed bug to its owning stage, and defines a 4-slice migration plan that keeps the public API and all existing tests green. Slice 1 (InputProcessingChain + audio_filter wiring) lands separately. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix: wire agent.audio_filter into the pipeline via an extracted InputProcessingChain (slice 1) audio_filter / audioFilter (Krisp / DeepFilterNet) was accepted by the public API, documented as "integrated before VAD and STT", implemented and unit-tested — but never invoked by any pipeline. Slice 1 of the pipeline-stages decomposition (docs/architecture/pipeline-stages.md) extracts the inbound half of on_audio_received / handleAudio into an InputProcessingChain that owns decode (mulaw->PCM16) -> stateful 8k->16k resample -> AEC near-end -> audio_filter (NEW) -> VAD feed and returns the processed frame + VAD event. The handlers keep the downstream logic (VAD-event handling, self-hearing gate, ring buffer, beforeSendToStt hook, STT feed) so the diff stays reviewable; with no AEC/filter/VAD configured the byte path is identical to before. - Filter wrapper is fail-open: raise / non-bytes return -> passthrough of the pre-filter PCM, WARN once then DEBUG, keeps attempting. - AEC/filter/VAD resolved via late-bound getters (start() and test fixtures install _aec/_auto_vad after construction). - TS chain also owns the per-call VAD error kill switch (former vadDisabled) including the 25 ms ONNX inference timeout race. - (Python) KrispVivaFilter now re-frames input internally to its configured frame_duration_ms (remainder buffered, dropped on sample-rate change) instead of raising on the pipeline's 20 ms frames vs its 10 ms default. - Tests: chain-level order assertion (AEC -> filter -> VAD via recording fakes), warn-once passthrough, mulaw + PCM parity vs stateful reference, handler-level proof that agent(audio_filter=...) transforms the bytes reaching a fake STT; Krisp re-framing unit tests. Both full unit suites pass unchanged (py 1435 passed/19 skipped; ts 1866 passed/8 skipped, tsc clean). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * test(server): make events.jsonl tool-event assertions order-insensitive Both logEvent calls in the transcript wrapper are fire-and-forget, so the on-disk append order of tool_call vs tool_result is not guaranteed (records carry their own ts). Assert content per type instead of order. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * docs(changelog): document the review-wave fixes under Unreleased Entries for the fix batches landed on this branch (greeting background task, per-call STT clone, dashboard SPA window/status/tombstones, persist default parity, call(first_message), TS onCallStart overrides, ConvAI client tools, Gemini Live transcriptions, Ultravox event semantics, Python 1h watchdog, AEC far-end staleness, pipeline speech events, events.jsonl, plus the provider/transport wave) per the AGENTS.md same-PR changelog rule. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * feat(pipeline): pause-and-resume false-interruption handling (Python) Opt-in agent.barge_in_mode="pause_resume" (default "cancel" keeps today's behaviour byte-identical). LiveKit-style state machine on VAD speech_start while the agent speaks: - PAUSE: gate the sentence/audio send loops on _output_paused and send_clear the carrier so queued audio stops within a frame. The LLM stream and TTS provider stream stay alive: sentences buffer as text (capped at 32) and synthesized audio queues into per-sentence retention entries (capped at ~15 s of playout; overflow while paused degrades to a full cancel, overflow while speaking releases retention for the turn). Mic audio flows to STT while paused (output is silent, so the line is echo-quiet) and the inbound ring is flushed so the confirm window can actually hear the user. - KILL: a committed final transcript (non-echo, non-hallucination, non-duplicate — the existing _handle_barge_in/_commit_transcript filter family) within barge_in_confirm_ms (default 1500 ms) runs the existing _do_cancel_for_barge_in path and discards the paused buffers. The overlap window anchored at pause time is preserved so InterruptionMetrics.detection_delay measures VAD-T1 -> confirm-T2. - RESUME: window expires with no confirming transcript -> re-send the cleared-but-unheard tail from retained audio at SENTENCE granularity (first sentence not fully played, derived from the #164 _playback_buffered_until cursor + heard-prefix segments; the partially-played sentence replays from its start) without re-billing TTS, then release the buffered sentences through the normal synth path. Recorded as a false interruption via record_overlap_end(was_interruption=False) — the backchannel counter, never an interruption — plus a false_interruption event. The playback bookkeeping is frozen at the heard offset on pause so a kill still rewrites history to the heard prefix; on resume the replay re-stamps segments so later barge-ins stay accurate. Turn bodies wait out an in-flight pause decision before ending (bounded by the confirm window) so buffered sentences are never orphaned. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * feat(pipeline): pause-and-resume false-interruption handling (TypeScript) TypeScript port of 3877814 — exact parity with the Python semantics, defaults, and events (camelCase ↔ snake_case naming): Opt-in agent.bargeInMode: 'pause_resume' (default 'cancel' keeps today's behaviour byte-identical). LiveKit-style state machine on VAD speech_start while the agent speaks: - PAUSE: gate the sentence/audio send loops on outputPaused and sendClear the carrier so queued audio stops within a frame. The LLM stream and TTS provider stream stay alive: sentences buffer as text (capped at 32) and synthesized audio queues into per-sentence retention entries (capped at ~15 s of playout; overflow while paused degrades to a full cancel, overflow while speaking releases retention for the turn). Mic audio flows to STT while paused and the inbound ring is flushed so the confirm window can actually hear the user. - KILL: a committed final transcript (non-echo, non-hallucination, non-duplicate — the existing handleBargeIn/commitTranscript filter family) within bargeInConfirmMs (default 1500 ms) runs the existing runBargeInCancel path and discards the paused buffers. The overlap window anchored at pause time is preserved so detection_delay measures VAD-T1 -> confirm-T2. - RESUME: window expires with no confirming transcript -> re-send the cleared-but-unheard tail from retained audio at SENTENCE granularity (first sentence not fully played, derived from the #164 playbackBufferedUntil cursor + heard-prefix segments; the partially-played sentence replays from its start) without re-billing TTS, then release the buffered sentences through the normal synth path. Recorded as a false interruption via recordOverlapEnd(false) — the backchannel counter, never an interruption — plus a 'false_interruption' event ({ resumedSentences }). The playback bookkeeping is frozen at the heard offset on pause so a kill still rewrites history to the heard prefix; on resume the replay re-stamps segments so later barge-ins stay accurate. Turn bodies wait out an in-flight pause decision before ending — completes the predecessor's in-progress port by bounding awaitPauseDecision (confirm window + 5 s fail-open margin, mirroring Python's _await_pause_decision) so a teardown race can never strand the dispatch loop. Tests mirror tests/unit/test_barge_in_pause_resume.py: pause gates without cancelling, paused buffering + overflow degradation, resume tail replay + false-interruption metrics/event, kill filters (final-only / hallucination / duplicate / frozen-prefix history rewrite), legacy cancel mode untouched, config-off defaults, streaming-loop integration (resume, kill, stream-ends-paused), and teardown mid-pause. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * feat(python): preemptive generation — speculative LLM+TTS on confident interim transcripts Opt-in Agent.preemptive_generation (default False): on an interim transcript that ends with sentence-final punctuation or is unchanged for preemptive_min_stable_ms (default 300), pipeline mode starts a speculative dispatch — built-in LLM loop + sentence-chunked TTS — and HOLDS all audio in memory (bounded ~15 s; overflow aborts). When the final transcript commits: - normalized match → RELEASE: buffered audio flushes to the carrier and the speculative task becomes the live turn; history/metrics record exactly one turn with the final transcript text as the user message, and TTFT/latency anchors are stamped from the REAL commit point (user-perceived timing). - mismatch → discard via the cancel-event machinery (history untouched) and dispatch normally on the final. At most one speculation in flight (a newer qualifying interim replaces it); VAD speech_start during speculation aborts silently. The consume loop races the next LLM token against the release signal so a commit mid-token-silence flushes immediately. New CallMetrics counters preemptive_hits / preemptive_misses (accumulator record_preemptive_hit/_miss). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(python): keep the newer speculation when _start_speculation loses the registration race _start_speculation awaits the old speculation's unwind (bounded 5 s) before registering its own _SpeculativeTurn. The interim-stability watcher and the STT receive loop both call it, so a second path could register a NEWER speculation during that await — and the resuming caller then overwrote it. The overwritten turn's task was orphaned parked on its release_event forever: never aborted, never released, never counted as a miss, holding up to ~15 s of buffered audio and an open LLM stream until call teardown. It also broke the documented at-most-one-speculation invariant (two tasks generating concurrently). Guard the registration: after the abort settles, yield to any speculation registered concurrently — it always corresponds to the later-arriving interim. Regression test interleaves a concurrent registration into the replacement window. Mirrored in the TS port's startSpeculation. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * feat(typescript): preemptive generation — speculative LLM+TTS on confident interim transcripts TS port of 238214e at parity: opt-in agent.preemptiveGeneration (default false) + preemptiveMinStableMs (default 300). On an interim that ends with sentence-final punctuation, or is unchanged for the stability window, pipeline mode starts a speculative dispatch — built-in LLM loop + sentence-chunked TTS — and HOLDS all audio in memory (~15 s playout cap; overflow aborts). When the final transcript commits: - normalized match → RELEASE: buffered audio flushes to the carrier and the speculative task becomes the live turn (tracked via dispatchTask); history/metrics record exactly one turn with the final transcript text as the user message, TTFT/latency anchors stamped from the real commit point (user-perceived timing). - mismatch → discard via the AbortController machinery (history and carrier untouched) and dispatch normally on the final. At most one speculation in flight: a newer qualifying interim replaces it (noteInterimTranscript is awaited on the transcript drain loop so replacements serialize — parity with Python's awaited _note_interim_transcript — and startSpeculation yields to a concurrently registered newer speculation, mirroring the Python fix). VAD speech_start during speculation aborts silently; handleStop / handleWsClose tear down without a miss. The token consume loop races the next LLM token against the release decision so a commit mid-token-silence flushes buffered audio immediately. New CallMetrics counters preemptive_hits / preemptive_misses (recordPreemptiveHit/Miss), mirroring Python. One TS-specific addition over the straight port: the released speculative task clears dispatchTask in its finally exactly like dispatchTurn's finally does — without this, canSpeculate() (which requires dispatchTask === null; the TS null-on-done convention, vs Python's dispatch.done()) stayed false for the rest of the call after the first hit. Covered by the sequential-two-hits regression test. Tests mirror Python's test_preemptive_generation.py: immediate start on punctuated interims, stability-window start, release (single LLM call, single history turn, hit counted, no audio before commit), mid-stream release flush, mismatch discard + normal re-dispatch, VAD abort, replacement, same-interim dedupe, buffer overflow, teardown without a miss, speculation gates (speaking / dispatch in flight), default-off. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * feat(pipeline): semantic end-of-utterance detection via smart-turn v3 (opt-in, both SDKs) Integrate the open pipecat-ai smart-turn v3 ONNX end-of-utterance model as an optional semantic turn detector for pipeline mode in the Python and TypeScript SDKs. Design: - Provider: SmartTurnDetector (getpatter/providers/smart_turn.py, src/providers/smart-turn.ts) implements the new TurnDetectorProvider interface (threshold / predict(pcm16-16k window) / close). The Whisper log-mel preprocessing (reflect-padded 400-pt STFT, 80 Slaney mel filters, last-8s left-padded window, zero-mean/unit-variance normalize) is ported natively in each SDK; cross-SDK numeric parity is locked by a reference-value test generated from the Python implementation. - Wiring: Agent.turn_detector / agent.turnDetector, off by default — the speech_end path is unchanged when unset. On a VAD speech_end the handler scores the rolling 8 s caller-audio window: probability >= threshold finalizes STT immediately (end-of-turn fires early); below threshold the finalize is HELD and re-scored every ~200 ms of further silence, capped by Agent.max_semantic_hold_ms / maxSemanticHoldMs (default 1200 ms, then plain vad_silence). A frame-driven poll plus a generation-guarded wall-clock backstop guarantee the cap even if inbound audio stalls; a VAD speech_start or an STT-side transcript commit cancels the hold. - Speech events: pipeline mode now fires on_user_speech_eos (only when a detector is configured — zero behavior change otherwise) with trigger EouTrigger.SEMANTIC_TURN_DETECTOR when the model decided the commit vs EouTrigger.VAD_SILENCE otherwise. - Graceful degradation: onnxruntime/numpy stay optional (the getpatter[turn-detector] extra; onnxruntime-node optionalDependency), imported lazily. SmartTurnDetector.maybe_load() / maybeLoad() warns once and returns None/undefined when the runtime or the model file (PATTER_SMART_TURN_MODEL or model_path) is unprovisioned, so the agent runs plain VAD-silence endpointing instead of crashing; load() keeps fail-fast errors with install/download instructions. At call time the handler fails open AND fails once: the first predict error logs a single warning and disables the detector for the rest of the call (the existing vadDisabled pattern). - Model weights are NOT bundled (~30 MB); downloaded by the user from https://huggingface.co/pipecat-ai/smart-turn-v3. Also fixes a scratch-buffer aliasing bug in the TS mixed-radix FFT base case (every n=50 sub-transform corrupted its even half and then overwrote it with the odd half), caught by the Python-generated reference-value parity test. Tests: pytest 2382 passed / exit 0 (ONNX session is the only mocked boundary, tagged @pytest.mark.mocked); vitest 1896 passed / exit 0 (*.mocked.test.ts twins); tsc --noEmit clean. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * fix(speech-events): single committed-EOS emission at transcript commit in pipeline mode Integration reconciliation between the unconditional pipeline speech events (this branch) and the semantic turn detector's EOS stamping (the smart-turn feature, written against a base where pipeline EOS never fired): - Pipeline EOS fires exactly once per committed turn, AT transcript commit (the analogue of Realtime's input_audio_buffer.committed) — before the hook veto and handler-availability checks — covering the on_message path and orphaned turns that the old emission point next to record_turn_committed missed. - The semantic detector's stamped trigger is consumed at that single point (semantic_turn_detector | vad_silence | manual_commit); the duplicate emission the feature carried is removed in both SDKs. - TS emitUserSpeechEos gains the vad_silence/manual_commit resolution Python already had (it hardcoded vad_silence) and an explicit-trigger arg for the Realtime path. - Released speculative turns (preemptive generation) bypass the dispatch path entirely: the release commit now performs the same semantic cleanup + EOS emission so combining the two opt-ins neither leaks a stale stamped trigger nor skips the event. - Detector tests updated to the merged contract (EOS always fires; only the trigger differs). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * feat(recording): Python — carrier-neutral local call recording (SDK-side stereo WAV) serve(local_recording=True) records every call from the media stream the SDK already proxies — no carrier recording API, no recording fees, audio never leaves the process. Works on Twilio, Telnyx and Plivo in every engine mode (pipeline / OpenAI Realtime / ElevenLabs ConvAI); independent of the carrier-side `recording` flag and off by default. - New getpatter/audio/call_recorder.py: LocalCallRecorder writes an interleaved stereo WAV (left=caller, right=agent — the QA-standard layout), 16-bit PCM @ 16 kHz; mulaw 8k / pcm16 8k / pcm16 24k inputs are decoded per channel with stateful resamplers. Caller-clocked alignment: inbound PSTN frames are the wall clock, agent TTS bursts drain at that rate from a bounded FIFO (60 s cap, overflow force-flushed), the idle channel is zero-padded. - Hot-path safe: 64 KiB buffered writes (no per-frame disk I/O), bounded memory, any I/O error disables the recorder without touching the call. - Placeholder RIFF header is patched on close(); every handler cleanup path (including abnormal carrier WS drops) finalizes, so truncated calls still yield parseable WAVs. - Wiring: EmbeddedServer.create_local_recorder resolves the target path (explicit dir string > call-log dir next to metadata.json/ transcript.jsonl > ./recordings fallback); the three telephony bridges attach the recorder before handler start and surface `recording_path` in the on_call_end payload; CallLogger.log_call_end persists it in metadata.json. Because the WAV lives in the per-call log directory, PATTER_LOG_RETENTION_DAYS sweeps recordings too. - Tests: WAV header/channel/length round-trips via stdlib wave, both- direction capture, caller-clock alignment + silence padding, encoding decodes, bounded backlog, buffered-write batching, abnormal-teardown finalization, idempotent close, path resolution + sanitization, bridge-level recording_path surfacing, retention sweep covering recordings, and config-off ⇒ zero filesystem writes. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * feat(recording): TypeScript — carrier-neutral local call recording (parity with Python) serve({ localRecording: true }) records every call SDK-side as a stereo WAV (left=caller, right=agent), 16-bit PCM @ 16 kHz — same defaults, layout, payload keys and metadata shape as the Python SDK. - New src/audio/call-recorder.ts: LocalCallRecorder with per-channel decode (mulaw_8k / pcm16_8k / pcm16_24k / pcm16_16k → PCM16 16 kHz via stateful resamplers), caller-clocked alignment with a bounded 60 s agent FIFO, 64 KiB batched writeSync (no per-frame disk I/O), and a placeholder RIFF header patched on close(). Exported from the package index (mirrors Python's importable getpatter.audio.call_recorder). - StreamHandler taps: caller audio at the top of handleAudio (above every engine-mode guard, wire codec from bridge.inputWireFormat); agent audio in encodePipelineAudio — the single chokepoint for all pipeline sends, decoding the carrier-native μ-law fast path instead of skipping — and in onAdapterAudio for Realtime/ConvAI (μ-law wire, PCM16 16 kHz for non-negotiated ConvAI). - fireCallEnd finalizes the WAV on both teardown funnels (handleStop and the abnormal handleWsClose) and surfaces `recording_path` in the onCallEnd payload; EmbeddedServer.makeLocalRecorder resolves the target path (explicit dir > call-log dir > ./recordings fallback) and CallLogger.logCallEnd persists recording_path in metadata.json, with callDir made public so the WAV lands next to transcript.jsonl and is covered by the PATTER_LOG_RETENTION_DAYS sweep. - Tests: real WAV byte round-trips (header fields, stereo mapping, sample rate, lengths), both-direction capture through the live handler taps, alignment + silence padding, encodings, bounded backlog, buffered-write batching, abnormal-teardown finalization, idempotent close, makeLocalRecorder path resolution + sanitization, retention sweep covering recordings, and config-off ⇒ zero writes / no recording_path key. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * docs(changelog): local_recording / localRecording — carrier-neutral local call recording https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * feat(python): warm transfer + multi-agent handoff Two related in-call capabilities, both opt-in with zero behavior change when unused: WARM TRANSFER — transfer_call / CallControl.transfer gain carrier-neutral mode ("cold" default, byte-identical blind redirect | "warm") + summary. Warm mode on Twilio parks the caller in a per-call conference on hold music, dials the human agent with the summary spoken first (<Say>), then bridges the two as the AI leg ends. New signature-validated, fail-closed webhooks: /webhooks/twilio/conference (lifecycle observability) and /webhooks/twilio/warm-status (releases a caller stuck on hold when the human never answers). Telnyx/Plivo return a clear {error} envelope and keep the AI on the line — never a silent blind-redirect fallback. Invalid modes are rejected with an error envelope on every path. MULTI-AGENT HANDOFF — agent(handoffs={name: Agent}) injects a built-in handoff_to(name, reason?) tool (names enum-constrained). Calling it (or PipelineStreamHandler._perform_handoff programmatically) swaps the live call to the target agent's system prompt, tools, variables, guardrails, text transforms, consult tool, and onward handoffs — history preserved, a [handoff] system line recorded and never replayed as a fabricated user turn. Pipeline mode: LLMLoop.update_agent swaps prompt + tool list for the next turn. Realtime mode: new OpenAIRealtimeAdapter.update_session sends a partial session.update (GA adapter adds the mandatory "type": "realtime" discriminator) BEFORE the function result so the next response already runs as the target. Unknown targets / malformed args return error envelopes — never silence. Audio infra established at call start (STT/TTS/engine connection, hence voice on engines that cannot switch mid-session) is retained; chained handoffs follow the target map. Tests: tests/test_handoff.py + tests/unit/test_warm_transfer_unit.py — authentic, mocking only the carrier REST boundary (@pytest.mark.mocked). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4 * feat(typescript): warm transfer + multi-agent handoff — full parity with Python Mirrors the Python SDK feature-for-feature (snake_case <-> camelCase): WARM TRANSFER — TRANSFER_CALL_TOOL gains mode ("cold" | "warm") + summ…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PATTER_TTS_TAIL_GRACE_MS(1.5 s) after the last push, not the last playback — so for most of the audible reply_is_speakingwas already false, user speech was classified as a calm next turn,send_clearnever fired, and the buffered reply played on (with the next turn's answer queued behind it).Implementation
_playback_buffered_until(Py) /playbackBufferedUntil(TS), advanced by_track_outbound_playback/trackOutboundPlaybackon every pipeline TTS chunk pushed to the carrier, at the chunk's real byte rate: PCM16 @ 16 kHz = 32 B/ms (default path, Telnyx native) or carrier-native μ-law @ 8 kHz = 8 B/ms (Twilio/Plivoulaw_8000)._end_speaking_with_grace/endSpeakingWithGrace— phase 1 holds_is_speaking=truewith_tail_grace_active=falsefor the whole estimated backlog, keeping VAD/transcript barge-in armed on the cancel path (full cancel +send_clear, which drops the carrier buffer instantly — Twilio media-streamclearsemantics); phase 2 is the unchanged echo-tail grace._do_cancel_for_barge_innow also clears the pending grace task — TS already invalidates it via the generation bump) and onendTailGraceForNewTurn.libraries/python/getpatter/stream_handler.py,libraries/typescript/src/stream-handler.ts, mirrored unit tests,CHANGELOG.md.Breaking change?
No. No new config; no default changes.
PATTER_TTS_TAIL_GRACE_MS=0still forces the legacy synchronous flip (tests/soak escape hatch).Test plan
pytest tests/ -m 'not soak'— 2371 passed, 0 failed (incl. newtests/unit/test_pipeline_bargein_buffered.py, 11 tests, RED without the fix: 10/11 fail)npm test— 1835 passed, 0 failed (incl. newtests/unit/pipeline-bargein-buffered.test.ts, 12 tests, RED without the fix: 12/12 fail) +npm run lint+npm run build_playback_buffered_until↔playbackBufferedUntil), defaults and tests in both SDKsDocs updates
CHANGELOG.mdUnreleased → Fixed entry added). Integration docs already describe barge-in as Patter-owned.🤖 Generated with Claude Code