PatterAI · nicolotognoni · Jun 10, 2026 · Jun 10, 2026 · Jun 10, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -87,6 +87,19 @@
   - **Back-to-back dedup fix** — a final within 500 ms of the previous is now dropped only when it is a *near-duplicate* (Deepgram emitting `speech_final` then `is_final` for the same utterance). A genuinely different fast follow-up (e.g. the real interruption right after a suppressed phantom) is kept instead of being silently swallowed into an empty turn.
   - **Interrupted-turn context rewrite** — on a confirmed mid-turn barge-in the spoken prefix is recorded in history with an `[interrupted by caller]` marker (instead of an ungrounded full reply), so a stateful agent runtime (Hermes/OpenClaw, keyed by `X-Hermes-Session-Id`) sees on the next turn that it was cut off and what the caller actually heard. `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`.
 - **Forward-STT-without-AEC no longer self-interrupts on its own echo.** The remaining live Hermes/OpenClaw barge-in failure: with `PATTER_FORWARD_STT_WHILE_SPEAKING` on, no AEC, and no `barge_in_strategies`, a VAD `speech_start` during TTS cancelled the turn immediately — but on a no-AEC link that `speech_start` is very often the agent's *own* TTS echo (or pre-first-token line noise during a long tool-running turn). The result was a cascade of false-positive interruptions: a short normal reply like "bene bene" produced `agent_text='[interrupted]'` with `bargein_ms≈0`, and the next turn's LLM ran for seconds but emitted `tts_characters=0` because it was torn down before its first token. The echo guard existed only on the *transcript* path, so the raw VAD-energy cancel had no protection. The VAD-energy cancel is now **deferred to transcript confirmation** whenever audio is forwarded during TTS without AEC (`forward_stt_while_speaking && aec is None`), exactly as it already was when `barge_in_strategies` are configured: the `speech_start` marks the barge-in *pending* (the agent keeps talking) and the cancel only fires once `_handle_barge_in` / `handleBargeIn` sees a real transcript that survives the echo guard; if none confirms within `barge_in_confirm_ms` (default 1500 ms) the agent resumes its sentence. The default VAD path and forward-STT *with* AEC keep the responsive immediate cancel — no behaviour change for existing configs. For the cleanest short-echo handling, still pair with `echo_cancellation=True` or `barge_in_strategies`. `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`.
+- **(Python) Twilio/Plivo mark frames now carry the caller-supplied name — first-message pacing no longer burns the mark-await timeout on every call.** `TwilioAudioSender.send_mark` (and the Plivo checkpoint equivalent) discarded the `mark_name` argument and sent a locally generated `audio_N` instead, so the `fm_N` echo the first-message pacer waited for never matched and every mark resolved via the 0.5 s fallback timeout (~1.5 s of guaranteed extra latency in the barge-in window of every Twilio call). The wire name is now the caller's, matching the TypeScript behaviour. `libraries/python/getpatter/telephony/twilio.py`, `.../telephony/plivo.py`.
+- **(TypeScript) Inbound audio frames are now awaited — a transient audio-path error can no longer kill the whole server.** All three carrier WS message handlers called `handler.handleAudio(...)` without `await`, so a rejection inside the audio path (VAD, resampler, STT send) escaped the surrounding `try/catch` and became an unhandled rejection, which terminates the Node process (Node 15+) together with every active call. `libraries/typescript/src/server.ts`.
+- **(TypeScript) Telnyx calls no longer leak `activeCallIds` entries.** The Telnyx WS close handler was the only one of the three carriers that never deleted its `ws → call_control_id` map entry, so the map grew for the server's lifetime and graceful shutdown issued hangup REST calls for long-dead calls. `libraries/typescript/src/server.ts`.
+- **Pipeline context rebuild no longer replays tool history entries as fabricated user turns.** `role="tool"` transcript entries (display/dashboard artefacts) were mapped to `role: "user"` when rebuilding the OpenAI messages array, injecting raw tool JSON into the conversation as if the caller had said it. They are now skipped — the tool outcome is already reflected in the assistant's following reply. `libraries/python/getpatter/services/llm_loop.py`, `libraries/typescript/src/llm-loop.ts`.
+- **(Python) Removed deprecated `asyncio.get_event_loop()` calls from running-loop contexts.** `_send_mark_awaitable` now uses `get_running_loop()`, and the ElevenLabs WS TTS barge-in cancel schedules the close on the loop captured at stream start (with a done-callback that surfaces close errors) instead of relying on `get_event_loop()` + `ensure_future`, which raises on Python 3.12+ when no loop is current on the calling thread. `libraries/python/getpatter/stream_handler.py`, `.../providers/elevenlabs_ws_tts.py`.
+- **(Python) The pre-barge-in inbound audio ring is now a `deque(maxlen=13)`.** The previous `list` + `pop(0)` eviction was O(n) and ran per media frame (~50/s) while the agent spoke; the flush path also snapshots before awaiting so a concurrent frame can't mutate the buffer mid-replay. `libraries/python/getpatter/stream_handler.py`.
+
+### Security
+
+- **Telnyx webhook anti-replay window aligned across SDKs: future-dated timestamps are rejected beyond a 30 s clock-skew allowance.** Python previously accepted timestamps up to 5 minutes in the FUTURE (`abs(age)` check), doubling the replay window for pre-issued signatures; TypeScript previously rejected ANY future timestamp, dropping legitimate webhooks on hosts whose clock lags Telnyx by even a second. Both now accept `[-30 s, +300 s]` of age. `libraries/python/getpatter/server.py`, `libraries/typescript/src/server.ts`.
+- **(TypeScript) `sanitizeVariables` now strips control characters and caps values at 500 chars (parity with Python).** Caller-supplied template variables (carrier custom params) are interpolated into the system prompt; a newline-bearing value could append adversarial prompt lines. `libraries/typescript/src/server.ts`.
+- **(TypeScript) LLM API error bodies are truncated to 200 chars in logs.** Provider 401 bodies have been observed to embed the rejected API-key prefix; the thrown error message was already capped but the ERROR-level log line was not. `libraries/typescript/src/llm-loop.ts`.
+- **Dependency advisories resolved via lockfile bump (`npm audit fix`): `hono` → 4.12.21+ (4 moderate advisories: IPv6 deny-rule bypass, Set-Cookie injection, JWT scheme laxness, percent-encoded mount routing) and `vitest` → 3.2.6+ (dev-only critical, Vitest UI arbitrary file read).** `libraries/typescript/package-lock.json`.
 
 ## 0.6.5 (2026-06-05)
 

diff --git a/libraries/python/getpatter/providers/elevenlabs_ws_tts.py b/libraries/python/getpatter/providers/elevenlabs_ws_tts.py
@@ -234,6 +234,12 @@ def __init__(
         # ``None`` when no synthesis is in progress.
         # Parity with TS ``ElevenLabsWebSocketTTS.activeStreamWs``.
         self._active_stream_ws: object = None
+        # Event loop the in-flight synthesis runs on, captured alongside
+        # ``_active_stream_ws``. ``cancel_active_stream`` may be invoked
+        # from sync context / another thread, where
+        # ``asyncio.get_running_loop()`` is unavailable and the deprecated
+        # ``get_event_loop()`` raises on Python 3.12+.
+        self._stream_loop: object = None
 
     @property
     def api_key(self) -> str:
@@ -369,18 +375,32 @@ def cancel_active_stream(self) -> None:
         if ws is None:
             return
         self._active_stream_ws = None
+        loop = self._stream_loop
+        self._stream_loop = None
         try:
             # ``websockets`` connection objects are asyncio-aware; close()
-            # schedules the close on the running event loop.  We fire-and-
+            # must run on the loop the stream was opened on. We fire-and-
             # forget here because cancel_active_stream is called from sync
-            # context (signal handler / barge-in cancel path).
-            import asyncio
-
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                loop.call_soon_threadsafe(
-                    lambda: asyncio.ensure_future(ws.close())  # type: ignore[attr-defined]
+            # context (signal handler / barge-in cancel path), possibly from
+            # a different thread — hence ``call_soon_threadsafe`` against the
+            # loop captured at stream start rather than the deprecated
+            # ``asyncio.get_event_loop()`` (which raises on 3.12+ when no
+            # loop is current on the calling thread).
+            def _schedule_close() -> None:
+                task = asyncio.get_running_loop().create_task(
+                    ws.close()  # type: ignore[attr-defined]
                 )
+                task.add_done_callback(_log_close_failure)
+
+            def _log_close_failure(task: asyncio.Task) -> None:
+                if task.cancelled():
+                    return
+                exc = task.exception()
+                if exc is not None:
+                    logger.debug("ElevenLabs WS close failed: %s", exc)
+
+            if loop is not None and not loop.is_closed():  # type: ignore[attr-defined]
+                loop.call_soon_threadsafe(_schedule_close)  # type: ignore[attr-defined]
             else:
                 asyncio.run(ws.close())  # type: ignore[attr-defined]
         except Exception:
@@ -498,6 +518,7 @@ async def synthesize(self, text: str) -> AsyncGenerator[bytes, None]:
             # force-close it and unblock the ``await ws.recv()`` below.
             # Parity with TS ``ElevenLabsWebSocketTTS.activeStreamWs``.
             self._active_stream_ws = ws
+            self._stream_loop = asyncio.get_running_loop()
 
             while True:
                 try:
@@ -566,6 +587,7 @@ async def synthesize(self, text: str) -> AsyncGenerator[bytes, None]:
             # local binding and the close below is idempotent.
             if self._active_stream_ws is ws:
                 self._active_stream_ws = None
+                self._stream_loop = None
             # Best-effort: tell the server to stop synthesising any
             # buffered text the consumer is no longer interested in.
             # Failure to send is non-fatal — the socket close below

diff --git a/libraries/python/getpatter/server.py b/libraries/python/getpatter/server.py
@@ -221,6 +221,12 @@ def _telnyx_hangup_outcome(cause: str) -> str | None:
     return None
 
 
+# Maximum tolerated clock skew for a Telnyx webhook timestamp that is in the
+# FUTURE relative to the local clock. Must match TS server.ts
+# ``TELNYX_FUTURE_SKEW_MS`` (SDK parity).
+_TELNYX_FUTURE_SKEW_MS = 30_000
+
+
 def _validate_telnyx_signature(
     raw_body: bytes,
     signature: str,
@@ -247,10 +253,11 @@ def _validate_telnyx_signature(
     ts_ms = ts * 1000 if ts < 1_000_000_000_000 else ts
     now_ms = int(time.time() * 1000)
     age_ms = now_ms - ts_ms
-    # ``abs`` tolerates small negative skew (timestamp slightly ahead of the
-    # local clock when the webhook host is a touch behind Telnyx) while still
-    # enforcing the ±tolerance anti-replay window.
-    if abs(age_ms) > tolerance_sec * 1000:
+    # Past-dated timestamps get the standard anti-replay tolerance. Future
+    # timestamps are tolerated only up to a small clock-skew allowance
+    # (webhook host a touch behind Telnyx) — a full ±tolerance window would
+    # double the replay surface. Mirrors the TS ``validateTelnyxSignature``.
+    if age_ms > tolerance_sec * 1000 or age_ms < -_TELNYX_FUTURE_SKEW_MS:
         return False
     try:
         from cryptography.exceptions import InvalidSignature

diff --git a/libraries/python/getpatter/services/llm_loop.py b/libraries/python/getpatter/services/llm_loop.py
@@ -1328,6 +1328,14 @@ def _build_messages(self, history: list[dict], user_text: str) -> list[dict]:
             text = entry.get("text", "")
             if role == "assistant":
                 messages.append({"role": "assistant", "content": text})
+            elif role == "tool":
+                # Tool entries in conversation history are display/dashboard
+                # artefacts. Replaying them as ``role: "tool"`` would 400 on
+                # the OpenAI API (no paired assistant ``tool_calls`` message),
+                # and replaying them as ``role: "user"`` fabricates user turns
+                # containing raw tool JSON. Skip them: the tool RESULT is
+                # already reflected in the assistant's following reply.
+                continue
             else:
                 messages.append({"role": "user", "content": text})
 

diff --git a/libraries/python/getpatter/stream_handler.py b/libraries/python/getpatter/stream_handler.py
@@ -2515,7 +2515,10 @@ def __init__(
         # ``speech_start``. Each frame is 20 ms × 32 bytes (16 kHz ×
         # 16-bit mono) ≈ 640 bytes; capped to 30 frames ≈ 600 ms ≈
         # ~19 KB per concurrent call.
-        self._inbound_audio_ring: list[bytes] = []
+        # ``deque(maxlen=...)`` evicts the oldest frame in O(1) on append —
+        # a plain list with ``pop(0)`` is O(n) and this runs per media frame
+        # (~50/s) while the agent speaks. Cap rationale at the append site.
+        self._inbound_audio_ring: deque[bytes] = deque(maxlen=13)
         # True when VAD fired ``speech_start`` during the agent's turn but
         # the barge-in gate suppressed it. The grace-timer flip drains the
         # ring buffer to STT so the user's words are not silently discarded.
@@ -4296,17 +4299,16 @@ async def on_audio_received(self, audio_bytes: bytes) -> None:
             # agent kept talking; long ones produced truncated
             # transcripts and the agent answered to fragments.
             if self._is_speaking:
-                self._inbound_audio_ring.append(pcm)
-                # Cap to ~250 ms (matching SileroVAD ``min_speech_duration``)
-                # so the post-barge-in replay only recovers the VAD-missed
-                # leading edge of the user's speech, not ~350 ms of
-                # pre-speech silence/agent-bleed. On PSTN (where AEC is a
+                # The deque's ``maxlen=13`` (~260 ms at 20 ms/frame, matching
+                # SileroVAD ``min_speech_duration``) evicts the oldest frame
+                # on append, so the post-barge-in replay only recovers the
+                # VAD-missed leading edge of the user's speech, not ~350 ms
+                # of pre-speech silence/agent-bleed. On PSTN (where AEC is a
                 # no-op) Deepgram trained on English transcribes that
                 # bleed as English garbage and commits it to the LLM as
                 # a phantom user transcript. See BUGS.md 2026-05-05
                 # post-barge-in bleed-transcription entry.
-                if len(self._inbound_audio_ring) > 13:  # ~260 ms at 20 ms/frame
-                    self._inbound_audio_ring.pop(0)
+                self._inbound_audio_ring.append(pcm)
                 # Opt-in: also forward the frame to STT during TTS so the
                 # transcript barge-in path can receive a transcript on
                 # echo-masked links where the VAD never fires. The ring push
@@ -4399,7 +4401,7 @@ async def _begin_speaking(self, is_first_message: bool = False) -> None:
         self._first_audio_sent_at = time.time()
         # Fresh turn — drop any stale pre-barge-in buffer from a previous
         # turn so we never replay yesterday's audio to STT.
-        self._inbound_audio_ring = []
+        self._inbound_audio_ring.clear()
         self._suppressed_speech_pending = False
         # Fresh turn — reset the echo-guard reference so this turn's barge-in
         # checks compare against THIS turn's spoken text, not the last turn's.
@@ -4620,12 +4622,15 @@ async def _flush_inbound_audio_ring(self) -> None:
         if self._stt is None or not self._inbound_audio_ring:
             return
         replayed = len(self._inbound_audio_ring)
-        for buf in self._inbound_audio_ring:
+        # Snapshot before the awaits below — a concurrent media frame could
+        # otherwise mutate the deque mid-iteration (RuntimeError).
+        frames = list(self._inbound_audio_ring)
+        self._inbound_audio_ring.clear()
+        for buf in frames:
             try:
                 await self._stt.send_audio(buf)
             except Exception as exc:
                 logger.debug("send_audio replay failed: %s", exc)
-        self._inbound_audio_ring = []
         logger.debug(
             "Flushed %d pre-turn-end frame(s) (~%d ms) to STT",
             replayed,
@@ -4705,8 +4710,7 @@ async def _send_mark_awaitable(self) -> asyncio.Future | None:
             return None
         self._first_message_mark_counter += 1
         mark_name = f"fm_{self._first_message_mark_counter}"
-        loop = asyncio.get_event_loop()
-        fut: asyncio.Future[None] = loop.create_future()
+        fut: asyncio.Future[None] = asyncio.get_running_loop().create_future()
         self._pending_marks.append((mark_name, fut))
         try:
             await self.audio_sender.send_mark(mark_name)

diff --git a/libraries/python/getpatter/telephony/plivo.py b/libraries/python/getpatter/telephony/plivo.py
@@ -269,15 +269,16 @@ async def send_mark(self, mark_name: str) -> None:
         Plivo acknowledges with a ``playedStream`` event carrying the same
         name once all audio queued before the checkpoint has played out —
         the analogue of Twilio's mark protocol that gates pacing / barge-in.
+        The caller-supplied name goes on the wire verbatim so the ack can be
+        matched back to the waiter that sent it.
         """
         self._chunk_count += 1
-        actual_name = f"audio_{self._chunk_count}"
         await self._ws.send_text(
             json.dumps(
                 {
                     "event": "checkpoint",
                     "streamId": self._stream_id,
-                    "name": actual_name,
+                    "name": mark_name,
                 }
             )
         )

diff --git a/libraries/python/getpatter/telephony/twilio.py b/libraries/python/getpatter/telephony/twilio.py
@@ -181,15 +181,21 @@ async def send_clear(self) -> None:
         )
 
     async def send_mark(self, mark_name: str) -> None:
-        """Send a Twilio media-stream mark frame to track playback completion."""
+        """Send a Twilio media-stream mark frame to track playback completion.
+
+        The caller-supplied name goes on the wire verbatim: Twilio echoes the
+        exact name back, and ``StreamHandler.on_mark`` resolves the matching
+        ``_pending_marks`` waiter by that name. Substituting a locally
+        generated name here would make every waiter miss its echo and fall
+        back to the timeout path.
+        """
         self._chunk_count += 1
-        actual_name = f"audio_{self._chunk_count}"
         await self._ws.send_text(
             json.dumps(
                 {
                     "event": "mark",
                     "streamSid": self._stream_sid,
-                    "mark": {"name": actual_name},
+                    "mark": {"name": mark_name},
                 }
             )
         )

diff --git a/libraries/python/tests/test_llm_loop.py b/libraries/python/tests/test_llm_loop.py
@@ -171,6 +171,31 @@ async def test_build_messages_from_history():
     assert messages[3] == {"role": "user", "content": "How are you?"}
 
 
+@pytest.mark.asyncio
+async def test_build_messages_skips_tool_history_entries():
+    """Tool entries in conversation history are display/dashboard artefacts.
+
+    Replaying them as ``role: "tool"`` would 400 on the OpenAI API (no paired
+    assistant ``tool_calls`` message); the old behaviour replayed them as
+    fabricated ``role: "user"`` turns containing raw tool JSON.
+    """
+    loop = _make_llm_loop()
+
+    history = [
+        {"role": "user", "text": "Transfer me", "timestamp": 1.0},
+        {"role": "tool", "text": 'transfer_call → {"ok": true}', "timestamp": 2.0},
+        {"role": "assistant", "text": "Transferring you now.", "timestamp": 3.0},
+    ]
+    messages = loop._build_messages(history, "Thanks")
+
+    assert messages == [
+        {"role": "system", "content": "You are a test assistant."},
+        {"role": "user", "content": "Transfer me"},
+        {"role": "assistant", "content": "Transferring you now."},
+        {"role": "user", "content": "Thanks"},
+    ]
+
+
 @pytest.mark.asyncio
 async def test_max_iterations_guard():
     """LLM loop stops after max iterations to prevent infinite tool loops."""