PatterAI · nicolotognoni · Jun 10, 2026 · Jun 10, 2026 · Jun 10, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -87,6 +87,8 @@
   - **Back-to-back dedup fix** — a final within 500 ms of the previous is now dropped only when it is a *near-duplicate* (Deepgram emitting `speech_final` then `is_final` for the same utterance). A genuinely different fast follow-up (e.g. the real interruption right after a suppressed phantom) is kept instead of being silently swallowed into an empty turn.
   - **Interrupted-turn context rewrite** — on a confirmed mid-turn barge-in the spoken prefix is recorded in history with an `[interrupted by caller]` marker (instead of an ungrounded full reply), so a stateful agent runtime (Hermes/OpenClaw, keyed by `X-Hermes-Session-Id`) sees on the next turn that it was cut off and what the caller actually heard. `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`.
 - **Forward-STT-without-AEC no longer self-interrupts on its own echo.** The remaining live Hermes/OpenClaw barge-in failure: with `PATTER_FORWARD_STT_WHILE_SPEAKING` on, no AEC, and no `barge_in_strategies`, a VAD `speech_start` during TTS cancelled the turn immediately — but on a no-AEC link that `speech_start` is very often the agent's *own* TTS echo (or pre-first-token line noise during a long tool-running turn). The result was a cascade of false-positive interruptions: a short normal reply like "bene bene" produced `agent_text='[interrupted]'` with `bargein_ms≈0`, and the next turn's LLM ran for seconds but emitted `tts_characters=0` because it was torn down before its first token. The echo guard existed only on the *transcript* path, so the raw VAD-energy cancel had no protection. The VAD-energy cancel is now **deferred to transcript confirmation** whenever audio is forwarded during TTS without AEC (`forward_stt_while_speaking && aec is None`), exactly as it already was when `barge_in_strategies` are configured: the `speech_start` marks the barge-in *pending* (the agent keeps talking) and the cancel only fires once `_handle_barge_in` / `handleBargeIn` sees a real transcript that survives the echo guard; if none confirms within `barge_in_confirm_ms` (default 1500 ms) the agent resumes its sentence. The default VAD path and forward-STT *with* AEC keep the responsive immediate cancel — no behaviour change for existing configs. For the cleanest short-echo handling, still pair with `echo_cancellation=True` or `barge_in_strategies`. `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`.
+- **Barge-in now works while the carrier is still PLAYING a long buffered reply — the "Hermes detects the interruption but keeps talking" bug.** The pipeline pushes TTS audio to the carrier as fast as the provider synthesizes it (no pacing) while the carrier buffers and plays at realtime. With a token-paced LLM the two stay roughly in sync, but an agent-runtime LLM (`HermesLLM` / `OpenClawLLM`) delivers its whole — often long — reply at once after the thinking pause: TTS outruns realtime and the carrier ends up holding tens of seconds of queued audio. The handler's speaking state ended a fixed `PATTER_TTS_TAIL_GRACE_MS` (1.5 s) after the last *push*, not the last *playback* — so for most of the audible reply `_is_speaking` was already false, every VAD `speech_start` / transcript was treated as a calm next turn instead of a barge-in, `send_clear` was never sent, and the buffered audio kept playing over the caller (with the next turn's reply queued behind it). The handler now tracks an **estimated playback cursor** (`_playback_buffered_until` / `playbackBufferedUntil`, advanced per pushed chunk at the chunk's real byte rate — PCM16@16kHz or carrier-native μ-law@8kHz) and `_end_speaking_with_grace` waits in two phases: phase 1 keeps `_is_speaking=true` with `_tail_grace_active=false` for the whole estimated backlog (barge-in stays armed and takes the full cancel + `send_clear` path, which drops the carrier buffer instantly); phase 2 is the unchanged echo-tail grace. Barge-in cancels reset the cursor (the buffer was just cleared). No new config; token-paced LLMs (no backlog) behave byte-identically to before, and `PATTER_TTS_TAIL_GRACE_MS=0` still forces the legacy synchronous flip. This is the industry-standard semantics (stop + flush client-side regardless of LLM state — cf. Twilio media-stream `clear`). `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`.
+- **Interrupted-turn history now records the reply prefix the caller actually HEARD (LiveKit-style truncation), not everything the LLM generated.** Builds on the playback cursor above. Two gaps closed: (a) on a **mid-turn** barge-in with an agent-runtime LLM, the whole reply had already been synthesized into the carrier buffer, so the `[interrupted by caller]` marker was appended to the FULL text — a stateful runtime (Hermes/OpenClaw) believed the caller heard everything; (b) on a barge-in landing **after the turn completed** (while the carrier still played the buffered tail) no marker was applied at all. The handler now tracks per-turn `(sentence, playback_start)` segments (`_turn_spoken_segments` / `turnSpokenSegments`; filler and `llm_error_message` audio advance the clock but add no segment) and maps `heard = total_pushed − carrier_backlog` to a sentence-granular prefix: the streaming path records `<heard prefix> [interrupted by caller]`, and the post-complete cancel paths rewrite the last assistant history entry the same way before clearing the buffer. No new config; with no tracked segments (e.g. no TTS) the legacy full-text marker is preserved. `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`.
 - **(Python) Twilio/Plivo mark frames now carry the caller-supplied name — first-message pacing no longer burns the mark-await timeout on every call.** `TwilioAudioSender.send_mark` (and the Plivo checkpoint equivalent) discarded the `mark_name` argument and sent a locally generated `audio_N` instead, so the `fm_N` echo the first-message pacer waited for never matched and every mark resolved via the 0.5 s fallback timeout (~1.5 s of guaranteed extra latency in the barge-in window of every Twilio call). The wire name is now the caller's, matching the TypeScript behaviour. `libraries/python/getpatter/telephony/twilio.py`, `.../telephony/plivo.py`.
 - **(TypeScript) Inbound audio frames are now awaited — a transient audio-path error can no longer kill the whole server.** All three carrier WS message handlers called `handler.handleAudio(...)` without `await`, so a rejection inside the audio path (VAD, resampler, STT send) escaped the surrounding `try/catch` and became an unhandled rejection, which terminates the Node process (Node 15+) together with every active call. `libraries/typescript/src/server.ts`.
 - **(TypeScript) Telnyx calls no longer leak `activeCallIds` entries.** The Telnyx WS close handler was the only one of the three carriers that never deleted its `ws → call_control_id` map entry, so the map grew for the server's lifetime and graceful shutdown issued hangup REST calls for long-dead calls. `libraries/typescript/src/server.ts`.