Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,19 @@
- **Back-to-back dedup fix** — a final within 500 ms of the previous is now dropped only when it is a *near-duplicate* (Deepgram emitting `speech_final` then `is_final` for the same utterance). A genuinely different fast follow-up (e.g. the real interruption right after a suppressed phantom) is kept instead of being silently swallowed into an empty turn.
- **Interrupted-turn context rewrite** — on a confirmed mid-turn barge-in the spoken prefix is recorded in history with an `[interrupted by caller]` marker (instead of an ungrounded full reply), so a stateful agent runtime (Hermes/OpenClaw, keyed by `X-Hermes-Session-Id`) sees on the next turn that it was cut off and what the caller actually heard. `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`.
- **Forward-STT-without-AEC no longer self-interrupts on its own echo.** The remaining live Hermes/OpenClaw barge-in failure: with `PATTER_FORWARD_STT_WHILE_SPEAKING` on, no AEC, and no `barge_in_strategies`, a VAD `speech_start` during TTS cancelled the turn immediately — but on a no-AEC link that `speech_start` is very often the agent's *own* TTS echo (or pre-first-token line noise during a long tool-running turn). The result was a cascade of false-positive interruptions: a short normal reply like "bene bene" produced `agent_text='[interrupted]'` with `bargein_ms≈0`, and the next turn's LLM ran for seconds but emitted `tts_characters=0` because it was torn down before its first token. The echo guard existed only on the *transcript* path, so the raw VAD-energy cancel had no protection. The VAD-energy cancel is now **deferred to transcript confirmation** whenever audio is forwarded during TTS without AEC (`forward_stt_while_speaking && aec is None`), exactly as it already was when `barge_in_strategies` are configured: the `speech_start` marks the barge-in *pending* (the agent keeps talking) and the cancel only fires once `_handle_barge_in` / `handleBargeIn` sees a real transcript that survives the echo guard; if none confirms within `barge_in_confirm_ms` (default 1500 ms) the agent resumes its sentence. The default VAD path and forward-STT *with* AEC keep the responsive immediate cancel — no behaviour change for existing configs. For the cleanest short-echo handling, still pair with `echo_cancellation=True` or `barge_in_strategies`. `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`.
- **(Python) Twilio/Plivo mark frames now carry the caller-supplied name — first-message pacing no longer burns the mark-await timeout on every call.** `TwilioAudioSender.send_mark` (and the Plivo checkpoint equivalent) discarded the `mark_name` argument and sent a locally generated `audio_N` instead, so the `fm_N` echo the first-message pacer waited for never matched and every mark resolved via the 0.5 s fallback timeout (~1.5 s of guaranteed extra latency in the barge-in window of every Twilio call). The wire name is now the caller's, matching the TypeScript behaviour. `libraries/python/getpatter/telephony/twilio.py`, `.../telephony/plivo.py`.
- **(TypeScript) Inbound audio frames are now awaited — a transient audio-path error can no longer kill the whole server.** All three carrier WS message handlers called `handler.handleAudio(...)` without `await`, so a rejection inside the audio path (VAD, resampler, STT send) escaped the surrounding `try/catch` and became an unhandled rejection, which terminates the Node process (Node 15+) together with every active call. `libraries/typescript/src/server.ts`.
- **(TypeScript) Telnyx calls no longer leak `activeCallIds` entries.** The Telnyx WS close handler was the only one of the three carriers that never deleted its `ws → call_control_id` map entry, so the map grew for the server's lifetime and graceful shutdown issued hangup REST calls for long-dead calls. `libraries/typescript/src/server.ts`.
- **Pipeline context rebuild no longer replays tool history entries as fabricated user turns.** `role="tool"` transcript entries (display/dashboard artefacts) were mapped to `role: "user"` when rebuilding the OpenAI messages array, injecting raw tool JSON into the conversation as if the caller had said it. They are now skipped — the tool outcome is already reflected in the assistant's following reply. `libraries/python/getpatter/services/llm_loop.py`, `libraries/typescript/src/llm-loop.ts`.
- **(Python) Removed deprecated `asyncio.get_event_loop()` calls from running-loop contexts.** `_send_mark_awaitable` now uses `get_running_loop()`, and the ElevenLabs WS TTS barge-in cancel schedules the close on the loop captured at stream start (with a done-callback that surfaces close errors) instead of relying on `get_event_loop()` + `ensure_future`, which raises on Python 3.12+ when no loop is current on the calling thread. `libraries/python/getpatter/stream_handler.py`, `.../providers/elevenlabs_ws_tts.py`.
- **(Python) The pre-barge-in inbound audio ring is now a `deque(maxlen=13)`.** The previous `list` + `pop(0)` eviction was O(n) and ran per media frame (~50/s) while the agent spoke; the flush path also snapshots before awaiting so a concurrent frame can't mutate the buffer mid-replay. `libraries/python/getpatter/stream_handler.py`.

### Security

- **Telnyx webhook anti-replay window aligned across SDKs: future-dated timestamps are rejected beyond a 30 s clock-skew allowance.** Python previously accepted timestamps up to 5 minutes in the FUTURE (`abs(age)` check), doubling the replay window for pre-issued signatures; TypeScript previously rejected ANY future timestamp, dropping legitimate webhooks on hosts whose clock lags Telnyx by even a second. Both now accept `[-30 s, +300 s]` of age. `libraries/python/getpatter/server.py`, `libraries/typescript/src/server.ts`.
- **(TypeScript) `sanitizeVariables` now strips control characters and caps values at 500 chars (parity with Python).** Caller-supplied template variables (carrier custom params) are interpolated into the system prompt; a newline-bearing value could append adversarial prompt lines. `libraries/typescript/src/server.ts`.
- **(TypeScript) LLM API error bodies are truncated to 200 chars in logs.** Provider 401 bodies have been observed to embed the rejected API-key prefix; the thrown error message was already capped but the ERROR-level log line was not. `libraries/typescript/src/llm-loop.ts`.
- **Dependency advisories resolved via lockfile bump (`npm audit fix`): `hono` → 4.12.21+ (4 moderate advisories: IPv6 deny-rule bypass, Set-Cookie injection, JWT scheme laxness, percent-encoded mount routing) and `vitest` → 3.2.6+ (dev-only critical, Vitest UI arbitrary file read).** `libraries/typescript/package-lock.json`.

## 0.6.5 (2026-06-05)

Expand Down
38 changes: 30 additions & 8 deletions libraries/python/getpatter/providers/elevenlabs_ws_tts.py
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,12 @@ def __init__(
# ``None`` when no synthesis is in progress.
# Parity with TS ``ElevenLabsWebSocketTTS.activeStreamWs``.
self._active_stream_ws: object = None
# Event loop the in-flight synthesis runs on, captured alongside
# ``_active_stream_ws``. ``cancel_active_stream`` may be invoked
# from sync context / another thread, where
# ``asyncio.get_running_loop()`` is unavailable and the deprecated
# ``get_event_loop()`` raises on Python 3.12+.
self._stream_loop: object = None

@property
def api_key(self) -> str:
Expand Down Expand Up @@ -369,18 +375,32 @@ def cancel_active_stream(self) -> None:
if ws is None:
return
self._active_stream_ws = None
loop = self._stream_loop
self._stream_loop = None
try:
# ``websockets`` connection objects are asyncio-aware; close()
# schedules the close on the running event loop. We fire-and-
# must run on the loop the stream was opened on. We fire-and-
# forget here because cancel_active_stream is called from sync
# context (signal handler / barge-in cancel path).
import asyncio

loop = asyncio.get_event_loop()
if loop.is_running():
loop.call_soon_threadsafe(
lambda: asyncio.ensure_future(ws.close()) # type: ignore[attr-defined]
# context (signal handler / barge-in cancel path), possibly from
# a different thread — hence ``call_soon_threadsafe`` against the
# loop captured at stream start rather than the deprecated
# ``asyncio.get_event_loop()`` (which raises on 3.12+ when no
# loop is current on the calling thread).
def _schedule_close() -> None:
task = asyncio.get_running_loop().create_task(
ws.close() # type: ignore[attr-defined]
)
task.add_done_callback(_log_close_failure)

def _log_close_failure(task: asyncio.Task) -> None:
if task.cancelled():
return
exc = task.exception()
if exc is not None:
logger.debug("ElevenLabs WS close failed: %s", exc)

if loop is not None and not loop.is_closed(): # type: ignore[attr-defined]
loop.call_soon_threadsafe(_schedule_close) # type: ignore[attr-defined]
else:
asyncio.run(ws.close()) # type: ignore[attr-defined]
except Exception:
Expand Down Expand Up @@ -498,6 +518,7 @@ async def synthesize(self, text: str) -> AsyncGenerator[bytes, None]:
# force-close it and unblock the ``await ws.recv()`` below.
# Parity with TS ``ElevenLabsWebSocketTTS.activeStreamWs``.
self._active_stream_ws = ws
self._stream_loop = asyncio.get_running_loop()

while True:
try:
Expand Down Expand Up @@ -566,6 +587,7 @@ async def synthesize(self, text: str) -> AsyncGenerator[bytes, None]:
# local binding and the close below is idempotent.
if self._active_stream_ws is ws:
self._active_stream_ws = None
self._stream_loop = None
# Best-effort: tell the server to stop synthesising any
# buffered text the consumer is no longer interested in.
# Failure to send is non-fatal — the socket close below
Expand Down
15 changes: 11 additions & 4 deletions libraries/python/getpatter/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,12 @@ def _telnyx_hangup_outcome(cause: str) -> str | None:
return None


# Maximum tolerated clock skew for a Telnyx webhook timestamp that is in the
# FUTURE relative to the local clock. Must match TS server.ts
# ``TELNYX_FUTURE_SKEW_MS`` (SDK parity).
_TELNYX_FUTURE_SKEW_MS = 30_000


def _validate_telnyx_signature(
raw_body: bytes,
signature: str,
Expand All @@ -247,10 +253,11 @@ def _validate_telnyx_signature(
ts_ms = ts * 1000 if ts < 1_000_000_000_000 else ts
now_ms = int(time.time() * 1000)
age_ms = now_ms - ts_ms
# ``abs`` tolerates small negative skew (timestamp slightly ahead of the
# local clock when the webhook host is a touch behind Telnyx) while still
# enforcing the ±tolerance anti-replay window.
if abs(age_ms) > tolerance_sec * 1000:
# Past-dated timestamps get the standard anti-replay tolerance. Future
# timestamps are tolerated only up to a small clock-skew allowance
# (webhook host a touch behind Telnyx) — a full ±tolerance window would
# double the replay surface. Mirrors the TS ``validateTelnyxSignature``.
if age_ms > tolerance_sec * 1000 or age_ms < -_TELNYX_FUTURE_SKEW_MS:
return False
try:
from cryptography.exceptions import InvalidSignature
Expand Down
8 changes: 8 additions & 0 deletions libraries/python/getpatter/services/llm_loop.py
Original file line number Diff line number Diff line change
Expand Up @@ -1328,6 +1328,14 @@ def _build_messages(self, history: list[dict], user_text: str) -> list[dict]:
text = entry.get("text", "")
if role == "assistant":
messages.append({"role": "assistant", "content": text})
elif role == "tool":
# Tool entries in conversation history are display/dashboard
# artefacts. Replaying them as ``role: "tool"`` would 400 on
# the OpenAI API (no paired assistant ``tool_calls`` message),
# and replaying them as ``role: "user"`` fabricates user turns
# containing raw tool JSON. Skip them: the tool RESULT is
# already reflected in the assistant's following reply.
continue
else:
messages.append({"role": "user", "content": text})

Expand Down
30 changes: 17 additions & 13 deletions libraries/python/getpatter/stream_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -2515,7 +2515,10 @@ def __init__(
# ``speech_start``. Each frame is 20 ms × 32 bytes (16 kHz ×
# 16-bit mono) ≈ 640 bytes; capped to 30 frames ≈ 600 ms ≈
# ~19 KB per concurrent call.
self._inbound_audio_ring: list[bytes] = []
# ``deque(maxlen=...)`` evicts the oldest frame in O(1) on append —
# a plain list with ``pop(0)`` is O(n) and this runs per media frame
# (~50/s) while the agent speaks. Cap rationale at the append site.
self._inbound_audio_ring: deque[bytes] = deque(maxlen=13)
# True when VAD fired ``speech_start`` during the agent's turn but
# the barge-in gate suppressed it. The grace-timer flip drains the
# ring buffer to STT so the user's words are not silently discarded.
Expand Down Expand Up @@ -4296,17 +4299,16 @@ async def on_audio_received(self, audio_bytes: bytes) -> None:
# agent kept talking; long ones produced truncated
# transcripts and the agent answered to fragments.
if self._is_speaking:
self._inbound_audio_ring.append(pcm)
# Cap to ~250 ms (matching SileroVAD ``min_speech_duration``)
# so the post-barge-in replay only recovers the VAD-missed
# leading edge of the user's speech, not ~350 ms of
# pre-speech silence/agent-bleed. On PSTN (where AEC is a
# The deque's ``maxlen=13`` (~260 ms at 20 ms/frame, matching
# SileroVAD ``min_speech_duration``) evicts the oldest frame
# on append, so the post-barge-in replay only recovers the
# VAD-missed leading edge of the user's speech, not ~350 ms
# of pre-speech silence/agent-bleed. On PSTN (where AEC is a
# no-op) Deepgram trained on English transcribes that
# bleed as English garbage and commits it to the LLM as
# a phantom user transcript. See BUGS.md 2026-05-05
# post-barge-in bleed-transcription entry.
if len(self._inbound_audio_ring) > 13: # ~260 ms at 20 ms/frame
self._inbound_audio_ring.pop(0)
self._inbound_audio_ring.append(pcm)
# Opt-in: also forward the frame to STT during TTS so the
# transcript barge-in path can receive a transcript on
# echo-masked links where the VAD never fires. The ring push
Expand Down Expand Up @@ -4399,7 +4401,7 @@ async def _begin_speaking(self, is_first_message: bool = False) -> None:
self._first_audio_sent_at = time.time()
# Fresh turn — drop any stale pre-barge-in buffer from a previous
# turn so we never replay yesterday's audio to STT.
self._inbound_audio_ring = []
self._inbound_audio_ring.clear()
self._suppressed_speech_pending = False
# Fresh turn — reset the echo-guard reference so this turn's barge-in
# checks compare against THIS turn's spoken text, not the last turn's.
Expand Down Expand Up @@ -4620,12 +4622,15 @@ async def _flush_inbound_audio_ring(self) -> None:
if self._stt is None or not self._inbound_audio_ring:
return
replayed = len(self._inbound_audio_ring)
for buf in self._inbound_audio_ring:
# Snapshot before the awaits below — a concurrent media frame could
# otherwise mutate the deque mid-iteration (RuntimeError).
frames = list(self._inbound_audio_ring)
self._inbound_audio_ring.clear()
for buf in frames:
try:
await self._stt.send_audio(buf)
except Exception as exc:
logger.debug("send_audio replay failed: %s", exc)
self._inbound_audio_ring = []
logger.debug(
"Flushed %d pre-turn-end frame(s) (~%d ms) to STT",
replayed,
Expand Down Expand Up @@ -4705,8 +4710,7 @@ async def _send_mark_awaitable(self) -> asyncio.Future | None:
return None
self._first_message_mark_counter += 1
mark_name = f"fm_{self._first_message_mark_counter}"
loop = asyncio.get_event_loop()
fut: asyncio.Future[None] = loop.create_future()
fut: asyncio.Future[None] = asyncio.get_running_loop().create_future()
self._pending_marks.append((mark_name, fut))
try:
await self.audio_sender.send_mark(mark_name)
Expand Down
5 changes: 3 additions & 2 deletions libraries/python/getpatter/telephony/plivo.py
Original file line number Diff line number Diff line change
Expand Up @@ -269,15 +269,16 @@ async def send_mark(self, mark_name: str) -> None:
Plivo acknowledges with a ``playedStream`` event carrying the same
name once all audio queued before the checkpoint has played out —
the analogue of Twilio's mark protocol that gates pacing / barge-in.
The caller-supplied name goes on the wire verbatim so the ack can be
matched back to the waiter that sent it.
"""
self._chunk_count += 1
actual_name = f"audio_{self._chunk_count}"
await self._ws.send_text(
json.dumps(
{
"event": "checkpoint",
"streamId": self._stream_id,
"name": actual_name,
"name": mark_name,
}
)
)
Expand Down
12 changes: 9 additions & 3 deletions libraries/python/getpatter/telephony/twilio.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,15 +181,21 @@ async def send_clear(self) -> None:
)

async def send_mark(self, mark_name: str) -> None:
"""Send a Twilio media-stream mark frame to track playback completion."""
"""Send a Twilio media-stream mark frame to track playback completion.

The caller-supplied name goes on the wire verbatim: Twilio echoes the
exact name back, and ``StreamHandler.on_mark`` resolves the matching
``_pending_marks`` waiter by that name. Substituting a locally
generated name here would make every waiter miss its echo and fall
back to the timeout path.
"""
self._chunk_count += 1
actual_name = f"audio_{self._chunk_count}"
await self._ws.send_text(
json.dumps(
{
"event": "mark",
"streamSid": self._stream_sid,
"mark": {"name": actual_name},
"mark": {"name": mark_name},
}
)
)
Expand Down
25 changes: 25 additions & 0 deletions libraries/python/tests/test_llm_loop.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,31 @@ async def test_build_messages_from_history():
assert messages[3] == {"role": "user", "content": "How are you?"}


@pytest.mark.asyncio
async def test_build_messages_skips_tool_history_entries():
"""Tool entries in conversation history are display/dashboard artefacts.

Replaying them as ``role: "tool"`` would 400 on the OpenAI API (no paired
assistant ``tool_calls`` message); the old behaviour replayed them as
fabricated ``role: "user"`` turns containing raw tool JSON.
"""
loop = _make_llm_loop()

history = [
{"role": "user", "text": "Transfer me", "timestamp": 1.0},
{"role": "tool", "text": 'transfer_call → {"ok": true}', "timestamp": 2.0},
{"role": "assistant", "text": "Transferring you now.", "timestamp": 3.0},
]
messages = loop._build_messages(history, "Thanks")

assert messages == [
{"role": "system", "content": "You are a test assistant."},
{"role": "user", "content": "Transfer me"},
{"role": "assistant", "content": "Transferring you now."},
{"role": "user", "content": "Thanks"},
]


@pytest.mark.asyncio
async def test_max_iterations_guard():
"""LLM loop stops after max iterations to prevent infinite tool loops."""
Expand Down
Loading
Loading