From 2df311febc134442c84b1235ddbbe434d2075ff8 Mon Sep 17 00:00:00 2001 From: nicolotognoni Date: Fri, 5 Jun 2026 19:28:11 +0200 Subject: [PATCH 01/11] =?UTF-8?q?feat(llm):=20Hermes=20DX=20=E2=80=94=20TS?= =?UTF-8?q?=20namespace=20exports,=20caller-hash=20session=20key,=20long-t?= =?UTF-8?q?urn=20filler?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three opt-in developer-experience improvements for the agent-LLM providers, full Python/TypeScript parity. - TypeScript namespace exports: `import { hermes, openclaw, openaiCompatible }` -> `new hermes.LLM()`, mirroring Python's `from getpatter.llm import hermes`. Frozen objects. - session_key_factory / sessionKeyFactory + session_key_from="caller_hash": derive the X-Hermes-Session-Key per call from a SHA-256 caller hash (new public SessionContext + hash_caller/hashCaller), so an agent runtime remembers a caller across calls WITHOUT the raw phone number ever reaching the wire or the logs. The factory takes precedence over the static session_key; a falsy return omits the header. The loop dispatch was generalised to thread caller/callee only to providers whose stream() declares them (or **kwargs) — built-in and minimal custom providers unchanged. An unknown session_key_from raises in both SDKs (parity). - long_turn_message / longTurnMessage (+ _after_s, default 4 s): opt-in spoken filler when a turn is slow and no audio has reached the caller yet — distinct from llm_error_message (which fires on error). Fires once, gated on emitted audio; the TS timer is serialised via an async clear() that awaits an in-flight filler so it can never overlap the real sentence. Adversarial review caught and fixed a TS filler double-speak race (the setTimeout callback could overlap the first real sentence; Python's asyncio path was immune). Python 2206 / TypeScript 1758 tests pass; tsc + build clean. --- CHANGELOG.md | 6 + docs/integrations/hermes.mdx | 26 ++ libraries/python/getpatter/__init__.py | 4 + libraries/python/getpatter/client.py | 13 + libraries/python/getpatter/llm/hermes.py | 44 ++- .../python/getpatter/llm/openai_compatible.py | 86 +++- libraries/python/getpatter/models.py | 58 +++ .../python/getpatter/services/llm_loop.py | 99 +++-- libraries/python/getpatter/stream_handler.py | 94 +++++ .../tests/test_llm_session_key_factory.py | 367 ++++++++++++++++++ .../tests/unit/test_long_turn_filler.py | 312 +++++++++++++++ libraries/typescript/src/index.ts | 13 + libraries/typescript/src/llm-loop.ts | 44 ++- libraries/typescript/src/llm/hermes.ts | 45 ++- .../typescript/src/llm/openai-compatible.ts | 81 +++- libraries/typescript/src/stream-handler.ts | 81 ++++ libraries/typescript/src/types.ts | 42 ++ .../tests/llm-namespace-exports.test.ts | 50 +++ .../llm-session-key-factory.mocked.test.ts | 286 ++++++++++++++ .../tests/long-turn-filler.mocked.test.ts | 341 ++++++++++++++++ 20 files changed, 2016 insertions(+), 76 deletions(-) create mode 100644 libraries/python/tests/test_llm_session_key_factory.py create mode 100644 libraries/python/tests/unit/test_long_turn_filler.py create mode 100644 libraries/typescript/tests/llm-namespace-exports.test.ts create mode 100644 libraries/typescript/tests/llm-session-key-factory.mocked.test.ts create mode 100644 libraries/typescript/tests/long-turn-filler.mocked.test.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index 220bafb..b727355 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,11 @@ ## Unreleased +### Added + +- **TypeScript namespace exports for the agent-LLM presets.** `import { hermes, openclaw, openaiCompatible } from "getpatter"` now works alongside the existing `HermesLLM` / `OpenClawLLM` / `OpenAICompatibleLLM` named exports, so `new hermes.LLM()` mirrors Python's `from getpatter.llm import hermes; hermes.LLM()`. `libraries/typescript/src/index.ts`. +- **`session_key_factory` / `sessionKeyFactory` — per-call long-term memory scope from a caller hash.** `OpenAICompatibleLLM` (and `HermesLLM`) can derive the `X-Hermes-Session-Key` header per call from a `SessionContext` (`call_id` / `caller` / `callee` / `caller_hash`) instead of a static value, so an agent runtime can remember a caller across calls **without the raw phone number ever reaching the wire or the logs**. Shortcut `HermesLLM(session_key_from="caller_hash")` installs a default `patter-caller-` factory (SHA-256, 16 hex chars). New public `SessionContext` + `hash_caller` / `hashCaller` helper. The factory takes precedence over the static `session_key`; a falsy return omits the header. The loop dispatch was generalised to thread `caller` / `callee` only to providers whose `stream()` declares them (or `**kwargs`), keeping built-in and minimal custom providers unchanged. `libraries/python/getpatter/models.py`, `.../llm/openai_compatible.py`, `.../llm/hermes.py`, `.../services/llm_loop.py` + TypeScript mirrors. +- **`long_turn_message` / `longTurnMessage` — opt-in spoken filler during a slow turn.** When an LLM turn takes longer than `long_turn_message_after_s` (default 4 s) and no audio has reached the caller yet, Patter speaks a short configurable line (e.g. "One moment, let me check.") instead of dead silence — useful for agent runtimes (Hermes / OpenClaw) that run tools mid-turn. Distinct from `llm_error_message` (which fires on error): this fires on **slowness**, once per turn, gated on emitted audio so it never double-speaks. `None` / unset = off (no behaviour change). `libraries/python/getpatter/models.py`, `.../stream_handler.py`, `.../client.py` + TypeScript mirrors. + ## 0.6.5 (2026-06-05) ### Added diff --git a/docs/integrations/hermes.mdx b/docs/integrations/hermes.mdx index 7b03ede..006ed0f 100644 --- a/docs/integrations/hermes.mdx +++ b/docs/integrations/hermes.mdx @@ -63,6 +63,23 @@ single turn can take **30–90 s**. That is why `HermesLLM` defaults to a **120 timeout (the generic provider's 60 s, raised for the preset) instead of the short ceiling used for raw inference providers — a turn that runs a tool isn't cut off mid-thought. +Because a tool-running turn can leave the caller in **silence** for several seconds, the +agent supports an opt-in spoken **filler**: set `long_turn_message` / `longTurnMessage` +(with `long_turn_message_after_s` / `longTurnMessageAfterS`, default 4 s) and Patter speaks +a short line if no audio has reached the caller yet by then. It fires once per turn, only +on slowness, and never overlaps the real reply. (A separate `llm_error_message` / +`llmErrorMessage` covers the gateway-down / timeout **error** case.) + +```python +agent = phone.agent( + stt=DeepgramSTT(), + llm=HermesLLM(), + tts=ElevenLabsTTS(), + long_turn_message="One moment, let me check that.", + long_turn_message_after_s=4, +) +``` + **Where the session lives.** Hermes is **stateless** and keys continuity off **HTTP headers**, not the OpenAI `user` field. Each phone call maps to **one** Hermes @@ -83,6 +100,15 @@ used for raw inference providers — a turn that runs a tool isn't cut off mid-t const llm = new HermesLLM({ sessionKey: 'customer-42' }); ``` + For **per-caller memory without storing the raw phone number**, derive the key from a + caller hash instead of a static value — `HermesLLM(session_key_from="caller_hash")` / + `new HermesLLM({ sessionKeyFrom: 'caller_hash' })` emits + `X-Hermes-Session-Key: patter-caller-` (SHA-256, 16 hex chars), so Hermes + remembers a caller across calls while the raw number never reaches the wire or the + logs. For a custom scheme, pass `session_key_factory` / `sessionKeyFactory`, a callback + that receives a `SessionContext` (`call_id` / `caller` / `callee` / `caller_hash`) and + returns the scope value (a falsy return omits the header for that call). + (Patter also still sends `user=patter-call-` for upstream-log correlation, but that field is **not** what drives the Hermes session — the headers are.) diff --git a/libraries/python/getpatter/__init__.py b/libraries/python/getpatter/__init__.py index 13f529b..0de24fb 100644 --- a/libraries/python/getpatter/__init__.py +++ b/libraries/python/getpatter/__init__.py @@ -47,9 +47,11 @@ OpenAICompatibleConsult, PipelineHooks, RealtimeTurnDetection, + SessionContext, STTConfig, TTSConfig, TurnMetrics, + hash_caller, ) from getpatter.services.barge_in_strategies import ( BargeInStrategy, @@ -419,9 +421,11 @@ def mix_pcm(agent: bytes, bg: bytes, ratio: float) -> bytes: "LatencyBreakdown", "PipelineHooks", "RealtimeTurnDetection", + "SessionContext", "STTConfig", "TTSConfig", "TurnMetrics", + "hash_caller", "BargeInStrategy", "MinWordsStrategy", "evaluate_barge_in_strategies", diff --git a/libraries/python/getpatter/client.py b/libraries/python/getpatter/client.py index a3ce686..12075a4 100644 --- a/libraries/python/getpatter/client.py +++ b/libraries/python/getpatter/client.py @@ -1441,6 +1441,8 @@ def agent( language: str = "en", first_message: str = "", llm_error_message: str | None = None, + long_turn_message: str | None = None, + long_turn_message_after_s: float = 4.0, tools: list[Tool] | None = None, stt: STTProvider | None = None, tts: TTSProvider | None = None, @@ -1482,6 +1484,15 @@ def agent( model: OpenAI Realtime model ID. language: BCP-47 language code, e.g. ``"en"``. first_message: If set, the agent speaks this immediately on connect. + long_turn_message: Pipeline mode only. Opt-in short filler spoken + when a turn is SLOW (e.g. an agent runtime running tools) and no + audio has reached the carrier after + ``long_turn_message_after_s`` seconds — distinct from + ``llm_error_message`` (which fires on an error). ``None`` + (default) keeps today's silence-while-thinking behaviour. Speaks + at most once per turn and never once real audio has started. + long_turn_message_after_s: Seconds to wait before the + ``long_turn_message`` filler fires. Default ``4.0``. tools: List of ``Tool`` instances (build with the ``tool()`` factory). stt: ``STTProvider`` instance for pipeline mode (e.g. ``DeepgramSTT(api_key=...)``). @@ -1659,6 +1670,8 @@ def agent( language=language, first_message=first_message, llm_error_message=llm_error_message, + long_turn_message=long_turn_message, + long_turn_message_after_s=long_turn_message_after_s, tools=tuple(tools_out) if tools_out is not None else None, provider=provider, stt=stt_resolved, diff --git a/libraries/python/getpatter/llm/hermes.py b/libraries/python/getpatter/llm/hermes.py index 0ebfa69..a0f8ece 100644 --- a/libraries/python/getpatter/llm/hermes.py +++ b/libraries/python/getpatter/llm/hermes.py @@ -18,9 +18,10 @@ from __future__ import annotations import os -from typing import ClassVar +from typing import Callable, ClassVar from getpatter.llm.openai_compatible import OpenAICompatibleLLMProvider +from getpatter.models import SessionContext __all__ = ["LLM"] @@ -57,13 +58,26 @@ class LLM(OpenAICompatibleLLMProvider): * per-call continuity → ``X-Hermes-Session-Id: patter-call-`` (always sent with a call id — the primary mechanism) * long-term memory → ``X-Hermes-Session-Key: `` (only sent - when ``session_key`` is configured) + when ``session_key`` / ``session_key_from`` / ``session_key_factory`` is + configured) Args: - session_key: Optional long-term memory scope. When set, every turn - emits ``X-Hermes-Session-Key: `` so Hermes namespaces - persistent memory across calls. Credential-grade — never logged. - ``None`` (default) means the header is not sent. + session_key: Optional STATIC long-term memory scope. When set, every + turn emits ``X-Hermes-Session-Key: `` so Hermes + namespaces persistent memory across calls. Credential-grade — never + logged. ``None`` (default) means the header is not sent. + session_key_from: Convenience selector for a built-in per-call key + derivation. Set to ``"caller_hash"`` to derive the session key per + call as ``f"patter-caller-{ctx.caller_hash}"`` (a stable, + non-reversible hash of the caller — never the raw number), enabling + per-caller cross-call memory. ``None`` (default) uses the static + ``session_key`` path. Ignored when ``session_key_factory`` is given + explicitly. + session_key_factory: Custom callable deriving the + ``X-Hermes-Session-Key`` value per call from a + :class:`getpatter.models.SessionContext`. Takes precedence over both + ``session_key`` and ``session_key_from``. A falsy return omits the + header for that call. Credential-grade — never logged. """ provider_key: ClassVar[str] = "hermes" @@ -76,11 +90,28 @@ def __init__( model: str | None = None, timeout: float = 120.0, session_key: str | None = None, + session_key_from: str | None = None, + session_key_factory: Callable[[SessionContext], str | None] | None = None, **kwargs, ) -> None: resolved_model = model or os.environ.get( "API_SERVER_MODEL_NAME", _DEFAULT_MODEL ) + # ``session_key_from="caller_hash"`` installs a default factory that + # scopes durable memory per caller via the non-reversible caller hash + # (never the raw number). An explicit ``session_key_factory`` always + # wins over this convenience selector. + if session_key_factory is None and session_key_from == "caller_hash": + session_key_factory = ( + lambda ctx: f"patter-caller-{ctx.caller_hash}" + if ctx.caller_hash + else None + ) + elif session_key_from is not None and session_key_from != "caller_hash": + raise ValueError( + "session_key_from must be 'caller_hash' or None, " + f"got {session_key_from!r}" + ) super().__init__( api_key=api_key, base_url=base_url, @@ -92,5 +123,6 @@ def __init__( session_id_prefix=_SESSION_ID_PREFIX, session_key_header=_SESSION_KEY_HEADER, session_key=session_key, + session_key_factory=session_key_factory, **kwargs, ) diff --git a/libraries/python/getpatter/llm/openai_compatible.py b/libraries/python/getpatter/llm/openai_compatible.py index b3d02c6..0825c30 100644 --- a/libraries/python/getpatter/llm/openai_compatible.py +++ b/libraries/python/getpatter/llm/openai_compatible.py @@ -45,8 +45,9 @@ import asyncio import logging import os -from typing import Any, AsyncIterator, ClassVar +from typing import Any, AsyncIterator, Callable, ClassVar +from getpatter.models import SessionContext, hash_caller from getpatter.services.llm_loop import OpenAILLMProvider __all__ = ["OpenAICompatibleLLMProvider", "LLM"] @@ -101,6 +102,17 @@ class OpenAICompatibleLLMProvider(OpenAILLMProvider): credential-grade memory scope and is NEVER logged. ``None`` (default) means the header is omitted even if ``session_key_header`` is set. + session_key_factory: Optional callable that derives the + ``session_key_header`` VALUE per call from a + :class:`getpatter.models.SessionContext` (carrying ``call_id`` / + ``caller`` / ``callee`` / ``caller_hash``). When set it takes + PRECEDENCE over the static ``session_key``: at request-build time + the factory is called and its return value is emitted in + ``session_key_header``. A falsy return (``None`` / ``""``) omits the + header for that call. The static ``session_key`` remains the simple + fallback used when no factory is configured. The returned value is a + credential-grade memory scope and is NEVER logged. ``None`` + (default) means the static path is used. **kwargs: Sampling kwargs forwarded to :class:`OpenAILLMProvider`. """ @@ -121,6 +133,7 @@ def __init__( session_id_prefix: str = "", session_key_header: str | None = None, session_key: str | None = None, + session_key_factory: Callable[[SessionContext], str | None] | None = None, **kwargs, ) -> None: try: @@ -155,6 +168,9 @@ def __init__( self._session_key_header = session_key_header # Credential-grade memory scope — never logged. self._session_key = session_key + # When set, derives the session_key_header value per call (caller-hash, + # etc.) and overrides the static session_key. Never logged. + self._session_key_factory = session_key_factory async def warmup(self) -> None: """Pre-call DNS / TLS warmup that omits ``Authorization`` for keyless gateways. @@ -198,21 +214,50 @@ def _record_completion_cost( except Exception: # pragma: no cover — defense in depth logger.debug("_record_completion_cost failed", exc_info=True) + def _resolve_session_key( + self, + *, + call_id: str | None, + caller: str | None, + callee: str | None, + ) -> str | None: + """Resolve the ``session_key_header`` VALUE for this call. + + When a ``session_key_factory`` is configured it is called with a + :class:`SessionContext` (the raw ``caller`` plus its non-reversible + :func:`hash_caller`) and its return value wins — a falsy return omits + the header. Otherwise the static ``session_key`` is used. The result is + a credential-grade memory scope and is never logged. + """ + if self._session_key_factory is not None: + ctx = SessionContext( + call_id=call_id, + caller=caller, + callee=callee, + caller_hash=hash_caller(caller), + ) + return self._session_key_factory(ctx) + return self._session_key + def _build_completion_kwargs( self, messages: list[dict], tools: list[dict] | None, *, call_id: str | None = None, + caller: str | None = None, + callee: str | None = None, ) -> dict[str, Any]: """Assemble ``chat.completions.create`` kwargs, adding session continuity. Extends the parent builder with up to three INDEPENDENT, opt-in session signals — the OpenAI ``user`` field, a per-call session-id - header, and a static memory-scope header. Each is gated separately, so - e.g. a runtime can take the per-call header without the ``user`` field. - Per-call signals require a ``call_id``; the memory-scope header does - not. When none applies the result is byte-identical to the parent + header, and a memory-scope header (static ``session_key`` OR a per-call + value from ``session_key_factory``). Each is gated separately, so e.g. a + runtime can take the per-call header without the ``user`` field. + Per-call ``user`` / session-id signals require a ``call_id``; the + memory-scope header does not (a factory may key off the caller hash + alone). When none applies the result is byte-identical to the parent (no ``user``, no ``extra_headers``). """ kwargs = super()._build_completion_kwargs(messages, tools) @@ -221,11 +266,16 @@ def _build_completion_kwargs( kwargs["user"] = f"{self._session_user_prefix}{call_id}" if self._session_id_header is not None and call_id: extra[self._session_id_header] = f"{self._session_id_prefix}{call_id}" - if self._session_key_header is not None and self._session_key: + if self._session_key_header is not None: # Truthy check (not ``is not None``): an empty-string session key is # not a meaningful memory scope — treat it as unset rather than - # emitting a confusing empty header on the wire. - extra[self._session_key_header] = self._session_key + # emitting a confusing empty header on the wire. The factory (when + # configured) takes precedence over the static session_key. + session_key_value = self._resolve_session_key( + call_id=call_id, caller=caller, callee=callee + ) + if session_key_value: + extra[self._session_key_header] = session_key_value if extra: # Merge over any pre-existing extra_headers (the parent never sets # this today, but the spread keeps it future-safe and clobber-free). @@ -239,15 +289,21 @@ async def stream( *, cancel_event: asyncio.Event | None = None, call_id: str | None = None, + caller: str | None = None, + callee: str | None = None, ) -> AsyncIterator[dict]: - """Stream chunks, threading ``call_id`` into the session continuity fields. - - Mirrors :meth:`OpenAILLMProvider.stream` but routes ``call_id`` into - ``_build_completion_kwargs`` so the per-call ``user`` / session header - are emitted. ``call_id`` is optional — unset means the parent-identical - no-session path. + """Stream chunks, threading per-call context into the session fields. + + Mirrors :meth:`OpenAILLMProvider.stream` but routes ``call_id`` (plus + ``caller`` / ``callee`` when a ``session_key_factory`` needs them) into + ``_build_completion_kwargs`` so the per-call ``user`` / session headers + are emitted. All three are optional — unset means the parent-identical + no-session path. ``caller`` is used only to compute the session-key + scope (and its non-reversible hash); it is never logged here. """ - kwargs = self._build_completion_kwargs(messages, tools, call_id=call_id) + kwargs = self._build_completion_kwargs( + messages, tools, call_id=call_id, caller=caller, callee=callee + ) response = await self._client.chat.completions.create(**kwargs) last_usage = None diff --git a/libraries/python/getpatter/models.py b/libraries/python/getpatter/models.py index f22c21b..bfdb0cc 100644 --- a/libraries/python/getpatter/models.py +++ b/libraries/python/getpatter/models.py @@ -10,6 +10,7 @@ from __future__ import annotations import asyncio +import hashlib import logging import re from dataclasses import dataclass, field @@ -393,6 +394,47 @@ def __post_init__(self) -> None: ) +def hash_caller(caller: str | None) -> str | None: + """Stable, non-reversible 16-char hash of a caller for session scoping. + + Used to derive a per-caller memory namespace (e.g. an agent runtime's + session key) WITHOUT ever exposing the raw phone number — the call site + keys cross-call memory off the hash, never the number itself. Returns the + first 16 hex chars of the SHA-256 digest of the UTF-8 ``caller`` string, or + ``None`` when ``caller`` is ``None`` / empty (no caller → no scope). The + 16-char (64-bit) truncation is plenty for namespacing while keeping the + emitted header value compact; it is NOT a security primitive (a phone + number has too little entropy to make the digest a secret) — its only job + is to avoid putting the raw number on the wire / in logs. + """ + if not caller: + return None + return hashlib.sha256(caller.encode("utf-8")).hexdigest()[:16] + + +@dataclass(frozen=True) +class SessionContext: + """Per-call context handed to a ``session_key_factory``. + + A session-aware LLM provider (e.g. :class:`getpatter.llm.hermes.LLM`) can + derive its memory-scope header value per call from this — most usefully + from :attr:`caller_hash`, a stable non-reversible hash of the caller, so + one phone number maps to one durable memory namespace across calls without + the raw number ever being emitted or logged. + + All fields are optional: ``call_id`` / ``caller`` / ``callee`` are present + when the call provides them; ``caller_hash`` is :func:`hash_caller` of + ``caller`` (``None`` when there is no caller). The raw ``caller`` is carried + here only so a factory CAN re-derive its own scope — it must never be put on + the wire or logged beyond what already exists. + """ + + call_id: str | None = None + caller: str | None = None + callee: str | None = None + caller_hash: str | None = None + + @dataclass(frozen=True) class Agent: """Configuration for a local-mode voice AI agent. @@ -429,6 +471,22 @@ class Agent: # behaviour: nothing is spoken on LLM error. Pipeline mode only — # Realtime / ConvAI surface provider errors on their own audio path. llm_error_message: str | None = None + # Opt-in spoken filler for pipeline mode when an LLM turn is SLOW (distinct + # from ``llm_error_message``, which fires on an ERROR). Agent-runtime + # providers (Hermes / OpenClaw) run tools / memory / skills internally, so a + # turn can take many seconds before the first word is spoken — the caller + # hears dead silence. When set to a non-empty string and the turn has + # produced NO audio after ``long_turn_message_after_s`` seconds, the SDK + # synthesizes this line ONCE through the normal TTS turn lifecycle (subject + # to barge-in) to fill the gap. It never fires once real audio has started + # this turn, and never double-speaks. ``None`` (default) keeps today's + # behaviour: nothing is spoken while a slow turn runs. Pipeline mode only. + long_turn_message: str | None = None + # Seconds to wait after the turn begins speaking before the + # ``long_turn_message`` filler fires (only consulted when + # ``long_turn_message`` is set and no audio has reached the carrier yet). + # Default ``4.0`` s. + long_turn_message_after_s: float = 4.0 tools: tuple[dict, ...] | None = None provider: ProviderMode = "openai_realtime" stt: STTConfig | None = None # which STT provider to use in pipeline mode diff --git a/libraries/python/getpatter/services/llm_loop.py b/libraries/python/getpatter/services/llm_loop.py index 2dc52c3..fb4c9f4 100644 --- a/libraries/python/getpatter/services/llm_loop.py +++ b/libraries/python/getpatter/services/llm_loop.py @@ -37,38 +37,59 @@ logger = logging.getLogger("getpatter") -# Per-provider-TYPE memo of whether ``stream`` accepts a ``call_id`` keyword. +# Per-call-context kwargs the loop MAY thread into ``provider.stream`` — but +# only those the provider's signature actually declares (or absorbs via +# ``**kwargs``). ``call_id`` predates ``caller`` / ``callee``: a provider that +# only declares ``call_id`` (every built-in before the session_key_factory +# feature) keeps getting just ``call_id`` and is unaffected by the additions. +_CALL_CONTEXT_STREAM_KWARGS = ("call_id", "caller", "callee") + +# Per-provider-TYPE memo of which call-context kwargs ``stream`` accepts. # Built-in providers declare ``call_id`` (or ``**kwargs``) and hit the fast # path after the first call; a user's minimal custom provider whose ``stream`` # is ``(self, messages, tools=None, *, cancel_event=None)`` is detected once and -# called WITHOUT ``call_id`` thereafter — otherwise it would raise TypeError. -_provider_accepts_call_id: dict[type, bool] = {} +# called WITHOUT any of these thereafter — otherwise it would raise TypeError. +_provider_accepted_stream_kwargs: dict[type, frozenset[str]] = {} -def _stream_accepts_call_id(provider: object) -> bool: - """Whether ``provider.stream`` tolerates a ``call_id`` keyword argument. +def _stream_accepted_context_kwargs(provider: object) -> frozenset[str]: + """Which of :data:`_CALL_CONTEXT_STREAM_KWARGS` ``provider.stream`` tolerates. - True when the signature declares a parameter named ``call_id`` OR accepts - ``**kwargs`` (``VAR_KEYWORD``). Cached per provider type to keep the hot - path cheap. Some callables (C-level, ``functools.partial`` without - ``__wrapped__``) refuse introspection — those default to ``False`` so the - safe no-``call_id`` path is taken rather than risking a new crash site. + A name is accepted when the signature declares a parameter of that name OR + the signature accepts ``**kwargs`` (``VAR_KEYWORD``), in which case ALL of + them are accepted. Cached per provider type to keep the hot path cheap. Some + callables (C-level, ``functools.partial`` without ``__wrapped__``) refuse + introspection — those default to the empty set so the safe no-context path + is taken rather than risking a new crash site. """ provider_type = type(provider) - cached = _provider_accepts_call_id.get(provider_type) + cached = _provider_accepted_stream_kwargs.get(provider_type) if cached is not None: return cached - accepts = False + accepted: set[str] = set() try: sig = inspect.signature(provider.stream) for param in sig.parameters.values(): - if param.name == "call_id" or param.kind is inspect.Parameter.VAR_KEYWORD: - accepts = True + if param.kind is inspect.Parameter.VAR_KEYWORD: + accepted = set(_CALL_CONTEXT_STREAM_KWARGS) break + if param.name in _CALL_CONTEXT_STREAM_KWARGS: + accepted.add(param.name) except (ValueError, TypeError): # pragma: no cover - exotic callables - accepts = False - _provider_accepts_call_id[provider_type] = accepts - return accepts + accepted = set() + result = frozenset(accepted) + _provider_accepted_stream_kwargs[provider_type] = result + return result + + +def _stream_accepts_call_id(provider: object) -> bool: + """Whether ``provider.stream`` tolerates a ``call_id`` keyword argument. + + Back-compat shim around :func:`_stream_accepted_context_kwargs` (some tests + and external callers still reference this). True when ``call_id`` is among + the accepted call-context kwargs. + """ + return "call_id" in _stream_accepted_context_kwargs(provider) # --------------------------------------------------------------------------- @@ -961,24 +982,32 @@ async def run( _span_cm.__enter__() _span_exc_info: tuple = (None, None, None) try: - # Only thread ``call_id`` into providers whose ``stream`` - # accepts it (or ``**kwargs``). A user's minimal custom provider - # with ``(messages, tools=None, *, cancel_event=None)`` would - # otherwise raise TypeError on the added keyword. ``cancel_event`` - # predates this and every Protocol implementer tolerates it. - if _stream_accepts_call_id(self._provider): - stream_iter = self._provider.stream( - messages, - self._openai_tools, - cancel_event=cancel_event, - call_id=call_context.get("call_id"), - ) - else: - stream_iter = self._provider.stream( - messages, - self._openai_tools, - cancel_event=cancel_event, - ) + # Thread only the per-call context kwargs the provider's + # ``stream`` actually declares (or absorbs via ``**kwargs``). A + # provider that declares just ``call_id`` keeps getting only + # ``call_id``; one that also declares ``caller`` / ``callee`` + # (e.g. the OpenAI-compatible provider with a session_key_factory) + # gets those too; a minimal custom provider with neither gets + # none. Each value is only included when present in + # ``call_context``. ``cancel_event`` predates this and every + # Protocol implementer tolerates it. + accepted = _stream_accepted_context_kwargs(self._provider) + context_kwargs = { + name: call_context[name] + for name in _CALL_CONTEXT_STREAM_KWARGS + if name in accepted and name in call_context + } + # ``call_id`` is threaded even when absent (value None) to + # preserve the prior contract where a session-aware provider was + # always handed ``call_id=``. + if "call_id" in accepted and "call_id" not in context_kwargs: + context_kwargs["call_id"] = call_context.get("call_id") + stream_iter = self._provider.stream( + messages, + self._openai_tools, + cancel_event=cancel_event, + **context_kwargs, + ) async for chunk in stream_iter: chunk_type = chunk.get("type") diff --git a/libraries/python/getpatter/stream_handler.py b/libraries/python/getpatter/stream_handler.py index a3b9e12..7967e54 100644 --- a/libraries/python/getpatter/stream_handler.py +++ b/libraries/python/getpatter/stream_handler.py @@ -3105,6 +3105,70 @@ async def _synthesize_sentence( self.audio_sender.reset_pcm_carry() return True + def _schedule_long_turn_filler( + self, + first_tts_chunk: list, + hook_executor: PipelineHookExecutor, + hook_ctx: HookContext, + ) -> "asyncio.Task | None": + """Spawn the opt-in long-turn filler task, or ``None`` when disabled. + + Returns ``None`` (no task) when ``agent.long_turn_message`` is unset / + empty — the default, byte-identical to today's behaviour. Otherwise + returns a task that waits ``agent.long_turn_message_after_s`` seconds and + then, IFF no audio has reached the carrier this turn + (``first_tts_chunk[0]`` still ``True``) AND we still own the floor + (``self._is_speaking``), synthesizes the filler ONCE via + ``_synthesize_sentence``. Guards strictly on "no audio emitted yet" so it + cannot double-speak; self-synthesis failure degrades to silence. + """ + message = getattr(self.agent, "long_turn_message", None) + if not message: + return None + after_s = getattr(self.agent, "long_turn_message_after_s", 4.0) + + async def _filler() -> None: + try: + await asyncio.sleep(after_s) + except asyncio.CancelledError: + # Cancelled before firing (real audio started / turn ended). + raise + # Fire at most once, only if the caller still heard SILENCE this + # turn and we still hold the floor (no concurrent barge-in). + if first_tts_chunk[0] and self._is_speaking: + try: + await self._synthesize_sentence( + message, hook_executor, hook_ctx, first_tts_chunk + ) + except asyncio.CancelledError: + raise + except Exception: # pragma: no cover - defensive + logger.exception("long_turn_message filler synthesis failed") + + return asyncio.create_task(_filler()) + + async def _cancel_long_turn_filler( + self, task: "asyncio.Task | None" + ) -> None: + """Cancel the long-turn filler task and await its teardown. + + Idempotent and race-safe: a ``None`` / already-finished task is a no-op, + ``CancelledError`` from the cancel is suppressed, and any exception the + task raised before cancellation is swallowed (already logged inside the + task). Returns ``None`` so callers can reassign the handle in one line. + """ + if task is None: + return None + if not task.done(): + task.cancel() + try: + await task + except asyncio.CancelledError: + pass + except Exception: # pragma: no cover - defensive + logger.debug("long_turn_message filler task ended with error", exc_info=True) + return None + async def _process_streaming_response(self, result, call_id: str) -> str: """Process a streaming (async generator) response through TTS with sentence chunking.""" chunker = SentenceChunker( @@ -3120,6 +3184,17 @@ async def _process_streaming_response(self, result, call_id: str) -> str: hook_executor = PipelineHookExecutor(hooks) hook_ctx = self._build_hook_context() + # Opt-in long-turn filler: when the turn is SLOW (agent runtime running + # tools/memory) and NO audio has reached the carrier yet, speak a short + # filler instead of dead silence. Distinct from ``llm_error_message`` + # (that fires on an LLM ERROR; this fires on SLOWNESS). The task waits + # ``long_turn_message_after_s`` then, IFF still no audio this turn AND we + # still own the floor, synthesizes the filler ONCE. Cancelled the moment + # real audio is emitted, on the error branch, and in the finally. + long_turn_task = self._schedule_long_turn_filler( + first_tts_chunk, hook_executor, hook_ctx + ) + # Reset the per-turn LLM cancel event so a stale cancel from a # previous turn cannot terminate this stream prematurely. The # event is *set* by ``_handle_barge_in`` to break out of the @@ -3178,6 +3253,15 @@ async def _process_streaming_response(self, result, call_id: str) -> str: continue # hook dropped this sentence sentence = transformed + # Real audio is about to be synthesized — cancel the + # long-turn filler so it can never fire (or double-speak) + # once the agent's own reply has started. Cancelling + # before the await is race-safe: asyncio is single- + # threaded, so the filler coroutine cannot interleave + # between this cancel and the synthesis call. + long_turn_task = await self._cancel_long_turn_filler( + long_turn_task + ) if not await self._synthesize_sentence( sentence, hook_executor, hook_ctx, first_tts_chunk ): @@ -3190,6 +3274,9 @@ async def _process_streaming_response(self, result, call_id: str) -> str: llm_error = True chunker.reset() # discard partial content on LLM error logger.exception("LLM streaming error: %s", exc) + # The turn errored — stop the filler so it cannot speak over the + # (distinct) error fallback below. + long_turn_task = await self._cancel_long_turn_filler(long_turn_task) # Close the active turn as interrupted so the metrics accumulator # does not leak an open turn when LLM throws mid-stream. if self.metrics is not None and self.metrics.turn_active: @@ -3240,12 +3327,19 @@ async def _process_streaming_response(self, result, call_id: str) -> str: continue sentence = transformed + # Real flushed audio about to play — cancel the filler. + long_turn_task = await self._cancel_long_turn_filler( + long_turn_task + ) if not await self._synthesize_sentence( sentence, hook_executor, hook_ctx, first_tts_chunk ): interrupted = True break finally: + # Ensure the long-turn filler task never outlives the turn (clean + # cancellation, CancelledError suppressed inside the helper). + await self._cancel_long_turn_filler(long_turn_task) # Schedule the flip to idle. Keeps the speaking flag set during # the audio tail still playing on the carrier so STT echo on # the trailing samples doesn't look like a fresh user turn. diff --git a/libraries/python/tests/test_llm_session_key_factory.py b/libraries/python/tests/test_llm_session_key_factory.py new file mode 100644 index 0000000..4a160cc --- /dev/null +++ b/libraries/python/tests/test_llm_session_key_factory.py @@ -0,0 +1,367 @@ +"""Tests for the per-call session-key factory (Feature #7). + +A ``session_key_factory`` derives the memory-scope header value per call from a +:class:`SessionContext` (carrying ``caller`` + its non-reversible +:func:`hash_caller`). The Hermes convenience ``session_key_from="caller_hash"`` +installs a default factory that scopes durable memory per caller WITHOUT the raw +number ever reaching the wire. + +Real code throughout — the only mocked surface is the paid external boundary +(``chat.completions.create``), tagged ``@pytest.mark.mocked``. The factory +resolution, the SessionContext construction, and the caller threading through +the REAL ``LLMLoop`` are all exercised against live code. +""" + +from __future__ import annotations + +import pytest + +from getpatter.llm import hermes +from getpatter.llm.openai_compatible import OpenAICompatibleLLMProvider +from getpatter.models import SessionContext, hash_caller +from getpatter.services.llm_loop import LLMLoop, _stream_accepted_context_kwargs + + +# --------------------------------------------------------------------------- +# hash_caller — stable, non-reversible, never the raw number +# --------------------------------------------------------------------------- + + +@pytest.mark.unit +def test_hash_caller_is_stable_and_not_the_raw_number() -> None: + number = "+15555550100" + h1 = hash_caller(number) + h2 = hash_caller(number) + # Deterministic across calls. + assert h1 == h2 + # 16 hex chars (64-bit truncation), and NOT the raw number. + assert len(h1) == 16 + assert all(c in "0123456789abcdef" for c in h1) + assert number not in h1 + assert h1 != number + + +@pytest.mark.unit +def test_hash_caller_distinguishes_different_callers() -> None: + assert hash_caller("+15555550100") != hash_caller("+15555550101") + + +@pytest.mark.unit +def test_hash_caller_none_or_empty_returns_none() -> None: + assert hash_caller(None) is None + assert hash_caller("") is None + + +@pytest.mark.unit +def test_session_context_defaults_are_all_none() -> None: + ctx = SessionContext() + assert ctx.call_id is None + assert ctx.caller is None + assert ctx.callee is None + assert ctx.caller_hash is None + # Frozen (immutable public config). + with pytest.raises(Exception): + ctx.caller = "x" # type: ignore[misc] + + +# --------------------------------------------------------------------------- +# Factory precedence on the generic provider +# --------------------------------------------------------------------------- + + +@pytest.mark.unit +def test_factory_overrides_static_session_key_and_sees_caller_hash() -> None: + seen: dict = {} + + def factory(ctx: SessionContext) -> str: + seen["ctx"] = ctx + return f"scope-{ctx.caller_hash}" + + provider = OpenAICompatibleLLMProvider( + base_url="http://127.0.0.1:9/v1", + model="m", + session_key_header="X-Mem", + session_key="static-key", # must be overridden by the factory + session_key_factory=factory, + ) + kwargs = provider._build_completion_kwargs( + [{"role": "user", "content": "hi"}], + None, + call_id="c1", + caller="+15555550100", + callee="+15555550101", + ) + expected_hash = hash_caller("+15555550100") + assert kwargs["extra_headers"]["X-Mem"] == f"scope-{expected_hash}" + # The factory saw the full SessionContext, including the raw caller and the + # callee — but the EMITTED value carries only the hash. + ctx = seen["ctx"] + assert ctx.call_id == "c1" + assert ctx.caller == "+15555550100" + assert ctx.callee == "+15555550101" + assert ctx.caller_hash == expected_hash + + +@pytest.mark.unit +def test_factory_returning_none_omits_the_header() -> None: + provider = OpenAICompatibleLLMProvider( + base_url="http://127.0.0.1:9/v1", + model="m", + session_key_header="X-Mem", + session_key="static-key", + session_key_factory=lambda ctx: None, + ) + kwargs = provider._build_completion_kwargs( + [{"role": "user", "content": "hi"}], None, call_id="c1", caller="+15555550100" + ) + # Factory returned falsy => header omitted entirely (no extra_headers at all + # here, since nothing else is configured). + assert "extra_headers" not in kwargs + + +@pytest.mark.unit +def test_static_session_key_used_when_no_factory() -> None: + provider = OpenAICompatibleLLMProvider( + base_url="http://127.0.0.1:9/v1", + model="m", + session_key_header="X-Mem", + session_key="static-key", + ) + kwargs = provider._build_completion_kwargs( + [{"role": "user", "content": "hi"}], None, call_id="c1", caller="+15555550100" + ) + assert kwargs["extra_headers"] == {"X-Mem": "static-key"} + + +@pytest.mark.unit +def test_factory_fires_even_without_call_id() -> None: + """The memory-scope header is per-call-independent: a factory keying off the + caller hash alone produces a header even with no call id.""" + provider = OpenAICompatibleLLMProvider( + base_url="http://127.0.0.1:9/v1", + model="m", + session_key_header="X-Mem", + session_key_factory=lambda ctx: f"caller-{ctx.caller_hash}", + ) + kwargs = provider._build_completion_kwargs( + [{"role": "user", "content": "hi"}], None, call_id=None, caller="+15555550100" + ) + assert kwargs["extra_headers"]["X-Mem"] == f"caller-{hash_caller('+15555550100')}" + + +# --------------------------------------------------------------------------- +# Hermes convenience: session_key_from="caller_hash" +# --------------------------------------------------------------------------- + + +@pytest.mark.unit +def test_hermes_session_key_from_installs_caller_hash_factory() -> None: + llm = hermes.LLM(session_key_from="caller_hash") + kwargs = llm._build_completion_kwargs( + [{"role": "user", "content": "hi"}], + None, + call_id="hid-1", + caller="+15555550100", + ) + expected = f"patter-caller-{hash_caller('+15555550100')}" + assert kwargs["extra_headers"]["X-Hermes-Session-Key"] == expected + # Per-call session id still flows alongside the memory scope. + assert kwargs["extra_headers"]["X-Hermes-Session-Id"] == "patter-call-hid-1" + + +@pytest.mark.unit +def test_hermes_session_key_from_omits_header_without_caller() -> None: + llm = hermes.LLM(session_key_from="caller_hash") + kwargs = llm._build_completion_kwargs( + [{"role": "user", "content": "hi"}], None, call_id="hid-1", caller=None + ) + # No caller => no caller_hash => the default factory returns None => header + # omitted. The per-call session-id header is still present. + assert "X-Hermes-Session-Key" not in kwargs["extra_headers"] + assert kwargs["extra_headers"]["X-Hermes-Session-Id"] == "patter-call-hid-1" + + +@pytest.mark.unit +def test_hermes_explicit_factory_wins_over_session_key_from() -> None: + llm = hermes.LLM( + session_key_from="caller_hash", + session_key_factory=lambda ctx: "custom-scope", + ) + kwargs = llm._build_completion_kwargs( + [{"role": "user", "content": "hi"}], None, call_id="hid-1", caller="+15555550100" + ) + assert kwargs["extra_headers"]["X-Hermes-Session-Key"] == "custom-scope" + + +@pytest.mark.unit +def test_hermes_rejects_unknown_session_key_from() -> None: + with pytest.raises(ValueError, match="caller_hash"): + hermes.LLM(session_key_from="something-else") + + +# --------------------------------------------------------------------------- +# Caller threads through the REAL LLMLoop into the provider's stream() +# --------------------------------------------------------------------------- + + +class _CallerRecordingProvider: + """Records the caller/callee/call_id it was streamed with.""" + + def __init__(self) -> None: + self.seen: dict = {} + + async def stream( + self, messages, tools=None, *, cancel_event=None, call_id=None, caller=None, callee=None + ): + self.seen = {"call_id": call_id, "caller": caller, "callee": callee} + yield {"type": "text", "content": "ok"} + + +class _CallIdOnlyProvider: + """Older provider that only declares call_id — must NOT receive caller.""" + + def __init__(self) -> None: + self.seen_kwargs: object = "<>" + + async def stream(self, messages, tools=None, *, cancel_event=None, call_id=None): + self.seen_kwargs = call_id + yield {"type": "text", "content": "ok"} + + +def _make_loop(provider) -> LLMLoop: + loop = LLMLoop.__new__(LLMLoop) + loop._provider = provider + loop._system_prompt = "You are a test assistant." + loop._tools = None + loop._tool_executor = None + loop._metrics = None + loop._event_bus = None + loop._model = "fake-model" + loop._provider_name = "fake" + loop._openai_tools = None + loop._tool_map = {} + loop._on_tool_call = None + loop._usage_missing_count = 0 + loop._logged_usage_fallback = False + return loop + + +@pytest.mark.unit +async def test_caller_callee_thread_through_loop_into_provider() -> None: + provider = _CallerRecordingProvider() + loop = _make_loop(provider) + async for _ in loop.run( + "Hi", [], {"call_id": "c9", "caller": "+15555550100", "callee": "+15555550101"} + ): + pass + assert provider.seen == { + "call_id": "c9", + "caller": "+15555550100", + "callee": "+15555550101", + } + + +@pytest.mark.unit +def test_signature_guard_classifies_caller_aware_provider() -> None: + accepted = _stream_accepted_context_kwargs(_CallerRecordingProvider()) + assert accepted == frozenset({"call_id", "caller", "callee"}) + # An older call_id-only provider is not handed caller/callee. + assert _stream_accepted_context_kwargs(_CallIdOnlyProvider()) == frozenset( + {"call_id"} + ) + + +@pytest.mark.unit +async def test_call_id_only_provider_never_receives_caller() -> None: + """A provider that declares only call_id must keep working when the loop has + caller/callee in context — it gets call_id only, never the new kwargs.""" + provider = _CallIdOnlyProvider() + loop = _make_loop(provider) + async for _ in loop.run( + "Hi", [], {"call_id": "c9", "caller": "+15555550100", "callee": "+15555550101"} + ): + pass + assert provider.seen_kwargs == "c9" + + +# --------------------------------------------------------------------------- +# Wire-level — mocks ONLY the paid boundary (chat.completions.create). +# --------------------------------------------------------------------------- + + +class _Choice: + def __init__(self, content) -> None: + self.delta = type("D", (), {"content": content, "tool_calls": None})() + + +class _Chunk: + def __init__(self, content) -> None: + self.choices = [_Choice(content)] + self.usage = None + + +class _FakeStream: + def __init__(self, chunks) -> None: + self._chunks = chunks + + def __aiter__(self): + return self._gen() + + async def _gen(self): + for chunk in self._chunks: + yield chunk + + async def close(self) -> None: # pragma: no cover - not exercised + pass + + +@pytest.mark.mocked +async def test_hermes_caller_hash_reaches_the_wire() -> None: + """End-to-end on the wire: Hermes(session_key_from='caller_hash') emits + X-Hermes-Session-Key=patter-caller- where =hash_caller(caller), + and the raw caller is NEVER in the header value.""" + llm = hermes.LLM(session_key_from="caller_hash") + captured: dict = {} + + async def fake_create(**kwargs): + captured.update(kwargs) + return _FakeStream([_Chunk("ok")]) + + llm._client.chat.completions.create = fake_create + + caller = "+15555550100" + async for _ in llm.stream( + [{"role": "user", "content": "hi"}], None, call_id="hid-1", caller=caller + ): + pass + + headers = captured["extra_headers"] + expected = f"patter-caller-{hash_caller(caller)}" + assert headers["X-Hermes-Session-Key"] == expected + # The raw number is never on the wire in the memory-scope header. + assert caller not in headers["X-Hermes-Session-Key"] + + +@pytest.mark.mocked +async def test_custom_factory_overrides_static_on_the_wire() -> None: + provider = OpenAICompatibleLLMProvider( + base_url="http://127.0.0.1:9/v1", + model="m", + session_key_header="X-Mem", + session_key="static-key", + session_key_factory=lambda ctx: f"dyn-{ctx.caller_hash}", + ) + captured: dict = {} + + async def fake_create(**kwargs): + captured.update(kwargs) + return _FakeStream([_Chunk("ok")]) + + provider._client.chat.completions.create = fake_create + + async for _ in provider.stream( + [{"role": "user", "content": "hi"}], None, call_id="c1", caller="+15555550100" + ): + pass + + assert captured["extra_headers"]["X-Mem"] == f"dyn-{hash_caller('+15555550100')}" diff --git a/libraries/python/tests/unit/test_long_turn_filler.py b/libraries/python/tests/unit/test_long_turn_filler.py new file mode 100644 index 0000000..d162c69 --- /dev/null +++ b/libraries/python/tests/unit/test_long_turn_filler.py @@ -0,0 +1,312 @@ +"""Authentic tests for the opt-in long-turn filler (pipeline mode, Feature #8). + +When an LLM turn is SLOW (e.g. an agent runtime running tools) and NO audio has +reached the carrier after ``agent.long_turn_message_after_s`` seconds, the SDK +speaks a short filler instead of dead silence — distinct from +``llm_error_message`` (which fires on an ERROR, not on slowness). + +Only the external boundary is mocked: the LLM provider's ``stream()`` (its +timing / the gateway hop) and the TTS byte boundary +(``_tts.synthesize`` yielding PCM). Everything inward — the real ``LLMLoop.run`` +async generator, the real ``PipelineStreamHandler._process_streaming_response``, +the real filler task scheduling / cancellation, the real ``_synthesize_sentence`` +speak primitive — runs unmocked. The filler timeout is set tiny (a few ms) so +the suite stays fast while exercising the real ``asyncio.sleep`` path. + +These tests carry ``@pytest.mark.mocked`` because the provider stream is an +external-boundary mock. +""" + +from __future__ import annotations + +import asyncio +from collections import deque +from unittest.mock import AsyncMock, MagicMock + +import pytest + +from getpatter.stream_handler import PipelineStreamHandler + +from tests.conftest import make_agent + +_FILLER = "One moment while I look that up." + + +# --------------------------------------------------------------------------- +# Boundary doubles — the ONLY mocks: the LLM stream timing and the TTS bytes +# --------------------------------------------------------------------------- + + +class _SlowThenTextLLMProvider: + """Sleeps past the filler timeout, THEN yields a complete sentence. + + Models an agent runtime that runs tools for a while before producing its + first words: the caller hears silence during ``delay_s``, the filler fires, + and only then the real reply arrives. + """ + + def __init__(self, delay_s: float) -> None: + self._delay_s = delay_s + + async def stream(self, messages, tools=None, **_kwargs): + await asyncio.sleep(self._delay_s) + yield {"type": "text", "content": "Here is your answer. "} + + +class _FastTextLLMProvider: + """Yields a complete sentence immediately (no slow gap).""" + + async def stream(self, messages, tools=None, **_kwargs): + yield {"type": "text", "content": "Quick reply right away. "} + + +class _FakeTTS: + """TTS byte boundary — ``synthesize(text)`` yields a couple of PCM chunks. + + Records every text it was asked to synthesize so a test can assert whether + (and in what order) the filler / the real reply were spoken. + """ + + output_format = "pcm_16000" + + def __init__(self) -> None: + self.synthesized: list[str] = [] + + async def synthesize(self, text: str): + self.synthesized.append(text) + yield b"\x00\x00" * 80 + yield b"\x00\x00" * 80 + + +def _make_loop(provider) -> object: + """Build a REAL ``LLMLoop`` wrapping the boundary provider double.""" + from getpatter.services.llm_loop import LLMLoop + + loop = LLMLoop.__new__(LLMLoop) + loop._provider = provider + loop._system_prompt = "You are a test assistant." + loop._tools = None + loop._tool_executor = None + loop._metrics = None + loop._event_bus = None + loop._model = "fake-model" + loop._provider_name = "fake" + loop._openai_tools = None + loop._tool_map = {} + loop._on_tool_call = None + loop._usage_missing_count = 0 + loop._logged_usage_fallback = False + return loop + + +def _make_handler(*, long_turn_message, long_turn_message_after_s, tts): + audio_sender = AsyncMock() + audio_sender.reset_pcm_carry = MagicMock() + overrides: dict = {"long_turn_message": long_turn_message} + if long_turn_message_after_s is not None: + overrides["long_turn_message_after_s"] = long_turn_message_after_s + handler = PipelineStreamHandler( + agent=make_agent(**overrides), + audio_sender=audio_sender, + call_id="call-long-turn", + caller="+15551110000", + callee="+15552220000", + resolved_prompt="p", + metrics=None, + for_twilio=True, + on_transcript=None, + conversation_history=deque(maxlen=10), + transcript_entries=deque(maxlen=10), + ) + handler.on_message = None + handler._tts = tts # type: ignore[assignment] + handler._is_speaking = True + return handler + + +# --------------------------------------------------------------------------- +# Positive: slow turn + message set → filler is spoken before the real reply +# --------------------------------------------------------------------------- + + +@pytest.mark.mocked +class TestFillerSpokenOnSlowTurn: + async def test_filler_spoken_when_turn_is_slow(self) -> None: + tts = _FakeTTS() + handler = _make_handler( + long_turn_message=_FILLER, + long_turn_message_after_s=0.02, + tts=tts, + ) + # The provider takes 80 ms before its first word; the filler fires at + # 20 ms — so the caller hears the filler, then the real reply. + loop = _make_loop(_SlowThenTextLLMProvider(delay_s=0.08)) + + result = loop.run("Hi", [], {"call_id": "call-long-turn"}) + await handler._process_streaming_response(result, "call-long-turn") + + # The filler was synthesized (and reached the carrier) FIRST, then the + # real reply followed — exactly one filler, no double-speak. + assert _FILLER in tts.synthesized + assert tts.synthesized.index(_FILLER) == 0 + assert "Here is your answer." in tts.synthesized + assert tts.synthesized.count(_FILLER) == 1 + handler.audio_sender.send_audio.assert_awaited() + + +# --------------------------------------------------------------------------- +# Negative: fast turn → filler must NOT fire (no race / no double-speak) +# --------------------------------------------------------------------------- + + +@pytest.mark.mocked +class TestFillerNotSpokenOnFastTurn: + async def test_filler_not_spoken_when_audio_starts_quickly(self) -> None: + tts = _FakeTTS() + handler = _make_handler( + long_turn_message=_FILLER, + long_turn_message_after_s=0.5, # well beyond the fast reply + tts=tts, + ) + loop = _make_loop(_FastTextLLMProvider()) + + result = loop.run("Hi", [], {"call_id": "call-long-turn"}) + await handler._process_streaming_response(result, "call-long-turn") + + # Real audio started immediately; the filler is cancelled before firing. + assert _FILLER not in tts.synthesized + assert "Quick reply right away." in tts.synthesized + + async def test_no_orphaned_filler_task_after_fast_turn(self) -> None: + """The filler task must be cleanly cancelled — no pending filler task + lingers past the turn (the cancellation path awaits/suppresses + CancelledError). Captures the actual task handle the handler created.""" + tts = _FakeTTS() + handler = _make_handler( + long_turn_message=_FILLER, + long_turn_message_after_s=0.5, + tts=tts, + ) + loop = _make_loop(_FastTextLLMProvider()) + + created: list = [] + real_schedule = handler._schedule_long_turn_filler + + def _capture(*args, **kwargs): + task = real_schedule(*args, **kwargs) + if task is not None: + created.append(task) + return task + + handler._schedule_long_turn_filler = _capture # type: ignore[assignment] + + result = loop.run("Hi", [], {"call_id": "call-long-turn"}) + await handler._process_streaming_response(result, "call-long-turn") + # Yield once so any cancelled task fully tears down. + await asyncio.sleep(0) + + # The filler task was created and is now finished (cancelled cleanly), + # not left pending past the turn. + assert len(created) == 1 + assert created[0].done() + assert created[0].cancelled() + + +# --------------------------------------------------------------------------- +# Regression: feature OFF by default → behaviour unchanged +# --------------------------------------------------------------------------- + + +@pytest.mark.mocked +class TestFillerOffByDefault: + async def test_unset_message_speaks_nothing_extra(self) -> None: + tts = _FakeTTS() + handler = _make_handler( + long_turn_message=None, # default — feature OFF + long_turn_message_after_s=None, + tts=tts, + ) + # Even on a slow turn, with the message unset nothing extra is spoken. + loop = _make_loop(_SlowThenTextLLMProvider(delay_s=0.05)) + + result = loop.run("Hi", [], {"call_id": "call-long-turn"}) + await handler._process_streaming_response(result, "call-long-turn") + + # Only the real reply — no filler ever synthesized. + assert tts.synthesized == ["Here is your answer."] + + +# --------------------------------------------------------------------------- +# Barge-in guard: floor flipped off during the slow gap → filler stays silent +# --------------------------------------------------------------------------- + + +@pytest.mark.mocked +class TestFillerSuppressedByBargeIn: + async def test_filler_not_spoken_when_floor_flips_off_before_firing(self) -> None: + tts = _FakeTTS() + handler = _make_handler( + long_turn_message=_FILLER, + long_turn_message_after_s=0.02, + tts=tts, + ) + + class _SlowFlipThenText: + async def stream(self, messages, tools=None, **_kwargs): + # Concurrent barge-in flips the floor off before the filler + # timeout elapses — the filler must observe ``_is_speaking`` is + # False and stay silent. + handler._is_speaking = False + await asyncio.sleep(0.08) + handler._is_speaking = True # restored for the (real) reply path + yield {"type": "text", "content": "Late reply. "} + + loop = _make_loop(_SlowFlipThenText()) + result = loop.run("Hi", [], {"call_id": "call-long-turn"}) + await handler._process_streaming_response(result, "call-long-turn") + + assert _FILLER not in tts.synthesized + + +# --------------------------------------------------------------------------- +# Authenticity invariant: the positive test exercises the REAL speak primitive +# --------------------------------------------------------------------------- + + +@pytest.mark.mocked +class TestExercisesRealSpeakPrimitive: + async def test_fails_if_synthesize_sentence_is_not_real(self) -> None: + tts = _FakeTTS() + handler = _make_handler( + long_turn_message=_FILLER, + long_turn_message_after_s=0.02, + tts=tts, + ) + loop = _make_loop(_SlowThenTextLLMProvider(delay_s=0.08)) + + filler_attempts: list[str] = [] + real_synth = handler._synthesize_sentence + + async def _broken_for_filler(sentence, *args, **kwargs): + # The FILLER's speak primitive is broken (its own try/except must + # swallow the failure → no filler PCM reaches the carrier). The real + # reply still routes to the genuine primitive, so the turn completes. + if sentence == _FILLER: + filler_attempts.append(sentence) + raise NotImplementedError + return await real_synth(sentence, *args, **kwargs) + + handler._synthesize_sentence = _broken_for_filler # type: ignore[assignment] + + result = loop.run("Hi", [], {"call_id": "call-long-turn"}) + # Must not raise — a filler-primitive outage degrades to silence, not a + # handler crash, and the real reply still plays. + await handler._process_streaming_response(result, "call-long-turn") + + # The filler attempted to speak through the (now-broken) primitive but + # its bytes never reached the carrier — proving the positive test above + # depends on the REAL primitive running, not a mock. + assert filler_attempts == [_FILLER] + assert _FILLER not in tts.synthesized + # The real reply still played (broken filler never blocked the turn). + assert "Here is your answer." in tts.synthesized diff --git a/libraries/typescript/src/index.ts b/libraries/typescript/src/index.ts index 1b99333..cbd16a0 100644 --- a/libraries/typescript/src/index.ts +++ b/libraries/typescript/src/index.ts @@ -32,6 +32,7 @@ export type { PipelineHooks, HookContext, RealtimeTurnDetection, + SessionContext, } from "./types"; // `Guardrail` is intentionally not re-exported from `./types` — the public // `Guardrail` identifier is the class from `./public-api` (exported below), @@ -192,11 +193,23 @@ export type { GoogleLLMOptions } from "./llm/google"; // OpenAI-compatible agent runtime / local inference gateway). export { LLM as OpenAICompatibleLLM, OpenAICompatibleLLMProvider } from "./llm/openai-compatible"; export type { OpenAICompatibleLLMOptions } from "./llm/openai-compatible"; +export { hashCaller } from "./llm/openai-compatible"; export { LLM as HermesLLM } from "./llm/hermes"; export type { HermesLLMOptions } from "./llm/hermes"; export { LLM as OpenClawLLM } from "./llm/openclaw"; export type { OpenClawLLMOptions } from "./llm/openclaw"; +// Namespace objects mirroring the Python ``from getpatter.llm import hermes`` +// ergonomics: ``import { hermes } from 'getpatter'; new hermes.LLM()`` builds +// the SAME class as the ``HermesLLM`` named export above. Provided alongside +// (not instead of) the named exports. +import { LLM as HermesLLMClass } from "./llm/hermes"; +import { LLM as OpenClawLLMClass } from "./llm/openclaw"; +import { LLM as OpenAICompatibleLLMClass } from "./llm/openai-compatible"; +export const hermes = Object.freeze({ LLM: HermesLLMClass }); +export const openclaw = Object.freeze({ LLM: OpenClawLLMClass }); +export const openaiCompatible = Object.freeze({ LLM: OpenAICompatibleLLMClass }); + // Voice Activity Detection (server-side) — Silero ONNX. export { SileroVAD } from "./providers/silero-vad"; export type { SileroVADOptions, SileroSampleRate } from "./providers/silero-vad"; diff --git a/libraries/typescript/src/llm-loop.ts b/libraries/typescript/src/llm-loop.ts index 6005f2b..3f2f7d2 100644 --- a/libraries/typescript/src/llm-loop.ts +++ b/libraries/typescript/src/llm-loop.ts @@ -432,6 +432,17 @@ export interface LLMStreamOptions { * config) no ``user`` field is sent — fully backward compatible. */ callId?: string; + /** + * Caller / callee for this turn (the same values the stream handler builds + * into ``callCtx.caller`` / ``callCtx.callee``). Threaded purely so a + * session-aware provider with a ``sessionKeyFactory`` can derive a per-caller + * memory scope from the NON-REVERSIBLE caller hash. Additive and optional: + * providers that read only ``signal`` / ``callId`` ignore them, and the raw + * ``caller`` is never logged. Mirrors the Python loop threading + * ``caller`` / ``callee`` into the provider's ``stream``. + */ + caller?: string; + callee?: string; } /** @@ -921,17 +932,30 @@ export class LLMLoop { const hasAfterLlmChunk = Boolean(hookExecutor?.hasAfterLlmChunk()); const allEmittedText: string[] = []; - // Thread the stable per-call id into the provider stream options so - // session-aware providers (OpenAI-compatible / Hermes / OpenClaw) can - // emit the ``user`` field for one runtime session per phone call. Purely - // additive: providers that read only ``signal`` ignore it. Only spread a - // string call id — leave ``opts`` untouched otherwise so existing - // behaviour is byte-identical when no call id is present. + // Thread the stable per-call id (plus caller / callee for a + // sessionKeyFactory) into the provider stream options so session-aware + // providers (OpenAI-compatible / Hermes / OpenClaw) can emit the ``user`` + // field / memory-scope header for one runtime session per phone call. + // Purely additive: providers that read only ``signal`` ignore them. Only + // build the augmented opts when at least one context value is a non-empty + // string — leave ``opts`` untouched otherwise so existing behaviour is + // byte-identical when no call context is present. The raw caller is never + // logged; a factory keys off its non-reversible hash. const callId = callContext.call_id; - const streamOpts: LLMStreamOptions | undefined = - typeof callId === 'string' && callId.length > 0 - ? { ...opts, callId } - : opts; + const caller = callContext.caller; + const callee = callContext.callee; + const hasContext = + (typeof callId === 'string' && callId.length > 0) || + (typeof caller === 'string' && caller.length > 0) || + (typeof callee === 'string' && callee.length > 0); + const streamOpts: LLMStreamOptions | undefined = hasContext + ? { + ...opts, + ...(typeof callId === 'string' && callId.length > 0 ? { callId } : {}), + ...(typeof caller === 'string' && caller.length > 0 ? { caller } : {}), + ...(typeof callee === 'string' && callee.length > 0 ? { callee } : {}), + } + : opts; for (let iter = 0; iter < maxIterations; iter++) { const toolCallsAccumulated = new Map(); diff --git a/libraries/typescript/src/llm/hermes.ts b/libraries/typescript/src/llm/hermes.ts index ef05517..39a441c 100644 --- a/libraries/typescript/src/llm/hermes.ts +++ b/libraries/typescript/src/llm/hermes.ts @@ -14,6 +14,7 @@ * pass ``sessionKey``. (It also still emits ``user=patter-call-`` for * upstream-log correlation, but that is not what drives the session.) */ +import type { SessionContext } from '../types'; import { OpenAICompatibleLLMProvider, type OpenAICompatibleLLMOptions, @@ -49,11 +50,28 @@ export interface HermesLLMOptions { /** Per-request timeout in seconds. Default ``120``. */ timeout?: number; /** - * Long-term memory scope. When set, emits ``X-Hermes-Session-Key`` so Hermes - * scopes durable memory to this value across calls. ``undefined`` (default) - * means the header is not sent. Credential-grade — never logged. + * Static long-term memory scope. When set, emits ``X-Hermes-Session-Key`` so + * Hermes scopes durable memory to this value across calls. ``undefined`` + * (default) means the header is not sent. Credential-grade — never logged. */ sessionKey?: string; + /** + * Convenience selector for a built-in per-call key derivation. Set to + * ``'caller_hash'`` to derive the session key per call as + * ``` `patter-caller-${ctx.callerHash}` ``` (a stable, non-reversible hash of + * the caller — never the raw number), enabling per-caller cross-call memory. + * ``undefined`` (default) uses the static ``sessionKey`` path. Ignored when + * ``sessionKeyFactory`` is given explicitly. Mirrors Python + * ``session_key_from``. + */ + sessionKeyFrom?: 'caller_hash'; + /** + * Custom callback deriving the ``X-Hermes-Session-Key`` value per call from a + * {@link SessionContext}. Takes precedence over both ``sessionKey`` and + * ``sessionKeyFrom``. A falsy return omits the header for that call. + * Credential-grade — never logged. Mirrors Python ``session_key_factory``. + */ + sessionKeyFactory?: (ctx: SessionContext) => string | undefined; /** Extra headers merged after the SDK ``User-Agent``. */ extraHeaders?: Record; /** Sampling temperature [0, 2]. */ @@ -93,6 +111,26 @@ export class LLM extends OpenAICompatibleLLMProvider { constructor(opts: HermesLLMOptions = {}) { const model = opts.model ?? process.env[MODEL_ENV] ?? DEFAULT_MODEL; + // ``sessionKeyFrom: 'caller_hash'`` installs a default factory that scopes + // durable memory per caller via the non-reversible caller hash (never the + // raw number). An explicit ``sessionKeyFactory`` always wins over it. + let sessionKeyFactory = opts.sessionKeyFactory; + if (!sessionKeyFactory && opts.sessionKeyFrom === 'caller_hash') { + sessionKeyFactory = (ctx: SessionContext): string | undefined => + ctx.callerHash ? `patter-caller-${ctx.callerHash}` : undefined; + } else if ( + opts.sessionKeyFrom !== undefined && + opts.sessionKeyFrom !== 'caller_hash' + ) { + // Runtime validation for non-TypeScript / dynamic-JS / JSON callers — the + // literal type already catches this at compile time. Mirrors Python's + // ValueError so a misconfigured key derivation fails loudly, not silently. + throw new Error( + `sessionKeyFrom must be 'caller_hash' or undefined, got ${JSON.stringify( + opts.sessionKeyFrom, + )}`, + ); + } const options: OpenAICompatibleLLMOptions = { apiKey: opts.apiKey, apiKeyEnv: API_KEY_ENV, @@ -104,6 +142,7 @@ export class LLM extends OpenAICompatibleLLMProvider { sessionIdPrefix: SESSION_ID_PREFIX, sessionKeyHeader: SESSION_KEY_HEADER, sessionKey: opts.sessionKey, + sessionKeyFactory, extraHeaders: opts.extraHeaders, temperature: opts.temperature, maxTokens: opts.maxTokens, diff --git a/libraries/typescript/src/llm/openai-compatible.ts b/libraries/typescript/src/llm/openai-compatible.ts index 2dd7f78..a19a399 100644 --- a/libraries/typescript/src/llm/openai-compatible.ts +++ b/libraries/typescript/src/llm/openai-compatible.ts @@ -44,16 +44,36 @@ * ``Bearer EMPTY`` placeholder breaks some gateways). */ +import { createHash } from 'node:crypto'; import type { LLMChunk, LLMProvider, LLMStreamOptions } from '../llm-loop'; import { mergeAbortSignals } from '../llm-loop'; import { parseOpenAISseStream } from '../providers/groq-llm'; import { PatterConnectionError } from '../errors'; import { getLogger } from '../logger'; +import type { SessionContext } from '../types'; import { VERSION } from '../version'; /** Default per-request timeout in seconds for the generic provider. */ const DEFAULT_TIMEOUT_S = 60; +/** + * Stable, non-reversible 16-char hash of a caller for session scoping. + * + * Used to derive a per-caller memory namespace (e.g. an agent runtime's session + * key) WITHOUT ever exposing the raw phone number — the call site keys cross- + * call memory off the hash, never the number itself. Returns the first 16 hex + * chars of the SHA-256 digest of the UTF-8 ``caller`` string, or ``undefined`` + * when ``caller`` is undefined / empty. The 16-char (64-bit) truncation is + * plenty for namespacing while keeping the emitted header value compact; it is + * NOT a security primitive (a phone number has too little entropy to make the + * digest a secret) — its only job is to keep the raw number off the wire / out + * of logs. Mirrors Python ``hash_caller``. + */ +export function hashCaller(caller?: string): string | undefined { + if (!caller) return undefined; + return createHash('sha256').update(caller, 'utf8').digest('hex').slice(0, 16); +} + /** Constructor options for {@link OpenAICompatibleLLMProvider}. */ export interface OpenAICompatibleLLMOptions { /** @@ -118,6 +138,17 @@ export interface OpenAICompatibleLLMOptions { * scope — NEVER logged. ``undefined`` (default) means the header is omitted. */ sessionKey?: string; + /** + * Optional callback that derives the ``sessionKeyHeader`` VALUE per call from + * a {@link SessionContext} (carrying ``callId`` / ``caller`` / ``callee`` / + * ``callerHash``). When set it takes PRECEDENCE over the static ``sessionKey``: + * at request-build time the factory is called and its return value is emitted + * in ``sessionKeyHeader``. A falsy return (``undefined`` / ``''``) omits the + * header for that call. The static ``sessionKey`` remains the simple fallback + * used when no factory is configured. The returned value is a credential-grade + * memory scope and is NEVER logged. Mirrors Python ``session_key_factory``. + */ + sessionKeyFactory?: (ctx: SessionContext) => string | undefined; /** Sampling temperature [0, 2]. */ temperature?: number; /** Max tokens in the assistant response (sent as ``max_completion_tokens``). */ @@ -165,6 +196,7 @@ export class OpenAICompatibleLLMProvider implements LLMProvider { private readonly sessionIdPrefix?: string; private readonly sessionKeyHeader?: string; private readonly sessionKey?: string; + private readonly sessionKeyFactory?: (ctx: SessionContext) => string | undefined; private readonly temperature?: number; private readonly maxTokens?: number; private readonly responseFormat?: Record; @@ -199,6 +231,7 @@ export class OpenAICompatibleLLMProvider implements LLMProvider { this.sessionIdPrefix = options.sessionIdPrefix; this.sessionKeyHeader = options.sessionKeyHeader; this.sessionKey = options.sessionKey; + this.sessionKeyFactory = options.sessionKeyFactory; this.temperature = options.temperature; this.maxTokens = options.maxTokens; this.responseFormat = options.responseFormat; @@ -223,7 +256,11 @@ export class OpenAICompatibleLLMProvider implements LLMProvider { * - ``sessionKeyHeader`` (+ ``sessionKey``) → the static ``sessionKey`` value. * ``sessionKey`` is a credential-grade memory scope and is never logged. */ - private buildHeaders(callId?: string): Record { + private buildHeaders( + callId?: string, + caller?: string, + callee?: string, + ): Record { const headers: Record = { 'Content-Type': 'application/json', 'User-Agent': `getpatter/${VERSION}`, @@ -236,15 +273,43 @@ export class OpenAICompatibleLLMProvider implements LLMProvider { // Per-call session id for session / transcript continuity. headers[this.sessionIdHeader] = `${this.sessionIdPrefix ?? ''}${callId}`; } - if (this.sessionKeyHeader && this.sessionKey) { - // Truthy check (not `!== undefined`): an empty-string session key is not - // a meaningful memory scope — treat it as unset rather than emitting a - // confusing empty header. Value is the raw key (never logged). - headers[this.sessionKeyHeader] = this.sessionKey; + if (this.sessionKeyHeader) { + // The factory (when configured) wins over the static sessionKey. Truthy + // check (not `!== undefined`): an empty-string scope is not meaningful — + // treat it as unset rather than emitting a confusing empty header. The + // value is credential-grade and never logged. + const sessionKeyValue = this.resolveSessionKey(callId, caller, callee); + if (sessionKeyValue) { + headers[this.sessionKeyHeader] = sessionKeyValue; + } } return headers; } + /** + * Resolve the ``sessionKeyHeader`` VALUE for this call. When a + * ``sessionKeyFactory`` is configured it is called with a + * {@link SessionContext} (the raw ``caller`` plus its non-reversible + * {@link hashCaller}) and its return value wins — a falsy return omits the + * header. Otherwise the static ``sessionKey`` is used. Never logged. + */ + private resolveSessionKey( + callId?: string, + caller?: string, + callee?: string, + ): string | undefined { + if (this.sessionKeyFactory) { + const ctx: SessionContext = { + callId, + caller, + callee, + callerHash: hashCaller(caller), + }; + return this.sessionKeyFactory(ctx); + } + return this.sessionKey; + } + /** * Pre-call DNS / TLS warmup for the configured endpoint. Best-effort: * 5 s timeout, all exceptions swallowed at debug level. The ``Authorization`` @@ -308,11 +373,13 @@ export class OpenAICompatibleLLMProvider implements LLMProvider { opts?: LLMStreamOptions, ): AsyncGenerator { const callId = opts?.callId; + const caller = opts?.caller; + const callee = opts?.callee; const body = this.buildBody(messages, tools, callId); const response = await fetch(`${this.baseUrl}/chat/completions`, { method: 'POST', - headers: this.buildHeaders(callId), + headers: this.buildHeaders(callId, caller, callee), body: JSON.stringify(body), signal: mergeAbortSignals(opts?.signal, AbortSignal.timeout(this.timeoutMs)), }); diff --git a/libraries/typescript/src/stream-handler.ts b/libraries/typescript/src/stream-handler.ts index 38a1198..342676c 100644 --- a/libraries/typescript/src/stream-handler.ts +++ b/libraries/typescript/src/stream-handler.ts @@ -2690,6 +2690,64 @@ export class StreamHandler { return true; } + /** + * Schedule the opt-in long-turn filler and return its async ``clear()``. + * + * When ``agent.longTurnMessage`` is unset / empty the returned clear is a + * no-op (byte-identical to today's behaviour). Otherwise a one-shot timer + * fires after ``agent.longTurnMessageAfterS`` seconds and, IFF no audio has + * reached the carrier this turn (``!ttsFirstByteSent.value``) AND we still own + * the floor (``this.isSpeaking``), synthesizes the filler ONCE via the same + * per-sentence TTS primitive every sentence uses. + * + * The returned ``clear()`` is **async**: it stops the timer AND, if the filler + * already started synthesizing (its ``setTimeout`` callback runs in a separate + * macro-task, so it can fire just before the first real sentence), AWAITS the + * in-flight synthesis so the filler audio can never interleave with the real + * sentence that follows. Idempotent; self-synthesis failure degrades to + * silence (never crashes the turn). The caller must clear on first real audio, + * on the error branch, and in the finally. + */ + private scheduleLongTurnFiller( + ttsFirstByteSent: { value: boolean }, + hookExecutor: PipelineHookExecutor, + hookCtx: HookContext, + label: string, + ): () => Promise { + const message = this.deps.agent.longTurnMessage; + if (!message) return async () => {}; + const afterS = this.deps.agent.longTurnMessageAfterS ?? 4.0; + let cancelled = false; + let inFlight: Promise | null = null; + const timer = setTimeout(() => { + // Fire at most once, only if the caller still heard SILENCE this turn, we + // still hold the floor, and the turn has not already moved on. + if (cancelled || ttsFirstByteSent.value || !this.isSpeaking) return; + // Track the in-flight synthesis so clear() can await it — serializing the + // filler before the real sentence so their audio can never interleave. + inFlight = this.synthesizeSentence( + message, + hookExecutor, + hookCtx, + ttsFirstByteSent, + ).catch((err) => { + getLogger().error( + `longTurnMessage filler synthesis failed (${label}):`, + err, + ); + }); + }, Math.max(0, afterS * 1000)); + return async () => { + cancelled = true; + clearTimeout(timer); + if (inFlight !== null) { + const pending = inFlight; + inFlight = null; + await pending; + } + }; + } + /** * Streaming built-in LLM path with sentence chunking and per-sentence * guardrails/TTS. Returns the concatenated response text. @@ -2716,6 +2774,20 @@ export class StreamHandler { const llmSignal = this.llmAbort.signal; let llmError = false; + // Opt-in long-turn filler: when the turn is SLOW (agent runtime running + // tools/memory) and NO audio has reached the carrier yet, speak a short + // filler instead of dead silence. Distinct from ``llmErrorMessage`` (that + // fires on an LLM ERROR; this fires on SLOWNESS). The timer waits + // ``longTurnMessageAfterS`` then, IFF still no audio this turn AND we still + // own the floor, synthesizes the filler ONCE. Cleared the moment real audio + // is emitted, on the error branch, and in the finally. + const clearLongTurnFiller = this.scheduleLongTurnFiller( + ttsFirstByteSent, + hookExecutor, + hookCtx, + label, + ); + // Span lifetime: LLM dispatch → final token / TTS handoff. Always closed // in the ``finally`` block so an early throw cannot leak a span. const llmSpan = startSpan(SPAN_LLM, { 'patter.call.id': this.callId }); @@ -2736,6 +2808,9 @@ export class StreamHandler { if (transformed === null) return; // hook dropped this sentence sentenceText = transformed; } + // Real audio is about to play — cancel the long-turn filler so it can + // never fire (or double-speak) once the agent's own reply has started. + await clearLongTurnFiller(); await this.synthesizeSentence(sentenceText, hookExecutor, hookCtx, ttsFirstByteSent); }; let firstSentenceEmitted = false; @@ -2769,6 +2844,9 @@ export class StreamHandler { // Treat AbortError as a clean barge-in cancellation, not an LLM error. const isAbort = (e as Error)?.name === 'AbortError' || llmSignal.aborted; + // The turn ended (error or clean abort) — stop the filler so it cannot + // speak over the error fallback below or after a barge-in. + await clearLongTurnFiller(); if (!isAbort) { llmError = true; chunker.reset(); // discard partial content on LLM error @@ -2810,6 +2888,9 @@ export class StreamHandler { } } } finally { + // Ensure the long-turn filler never outlives the turn (idempotent — a + // no-op when already cleared at the first real audio / error branch). + await clearLongTurnFiller(); this.endSpeakingWithGrace(); // Drop the per-turn abort controller so the next turn starts with a // fresh one and barge-ins on the next turn cannot accidentally fire diff --git a/libraries/typescript/src/types.ts b/libraries/typescript/src/types.ts index 5a9bb26..8f37186 100644 --- a/libraries/typescript/src/types.ts +++ b/libraries/typescript/src/types.ts @@ -489,6 +489,28 @@ export interface BackgroundAudioPlayer { * (see ``Patter.agent()`` for the resolution). * 3. Otherwise, the AgentOptions default is used. */ +/** + * Per-call context handed to a ``sessionKeyFactory`` (see + * {@link OpenAICompatibleLLMOptions.sessionKeyFactory}). + * + * A session-aware LLM provider (e.g. the Hermes preset) can derive its + * memory-scope header value per call from this — most usefully from + * {@link SessionContext.callerHash}, a stable non-reversible hash of the + * caller, so one phone number maps to one durable memory namespace across calls + * WITHOUT the raw number ever being emitted or logged. + * + * All fields are optional: ``callId`` / ``caller`` / ``callee`` are present when + * the call provides them; ``callerHash`` is {@link hashCaller} of ``caller`` + * (``undefined`` when there is no caller). The raw ``caller`` is carried here + * only so a factory CAN re-derive its own scope — it must never be put on the + * wire or logged beyond what already exists. Mirrors Python ``SessionContext``. + */ +export interface SessionContext { + readonly callId?: string; + readonly caller?: string; + readonly callee?: string; + readonly callerHash?: string; +} /** Configuration for a local-mode voice AI agent (passed to `phone.agent({...})`). */ export interface AgentOptions { readonly systemPrompt: string; @@ -524,6 +546,26 @@ export interface AgentOptions { * Mirrors Python ``llm_error_message`` on ``Patter.agent()`` / ``Agent``. */ readonly llmErrorMessage?: string; + /** + * Opt-in short filler spoken when an LLM turn is SLOW (e.g. an agent runtime + * running tools / memory) and no audio has reached the carrier yet — DISTINCT + * from ``llmErrorMessage`` (which fires on an ERROR; this fires on SLOWNESS). + * When set to a non-empty string and the turn has produced NO audio after + * ``longTurnMessageAfterS`` seconds, the SDK synthesizes this line ONCE + * through the normal TTS turn lifecycle (subject to barge-in) to fill the + * gap. It never fires once real audio has started this turn, and never + * double-speaks. ``undefined`` (default) keeps today's behaviour: nothing is + * spoken while a slow turn runs. Pipeline mode only. Mirrors Python + * ``long_turn_message`` on ``Patter.agent()`` / ``Agent``. + */ + readonly longTurnMessage?: string; + /** + * Seconds to wait after the turn begins speaking before the + * ``longTurnMessage`` filler fires (only consulted when ``longTurnMessage`` + * is set and no audio has reached the carrier yet). Default ``4.0``. Mirrors + * Python ``long_turn_message_after_s``. + */ + readonly longTurnMessageAfterS?: number; /** Tool definitions — ``Tool`` class instances from ``getpatter``. */ readonly tools?: ReadonlyArray; /** diff --git a/libraries/typescript/tests/llm-namespace-exports.test.ts b/libraries/typescript/tests/llm-namespace-exports.test.ts new file mode 100644 index 0000000..688abb0 --- /dev/null +++ b/libraries/typescript/tests/llm-namespace-exports.test.ts @@ -0,0 +1,50 @@ +/** + * Tests for the ``hermes`` / ``openclaw`` / ``openaiCompatible`` namespace + * objects (Feature #6), mirroring the Python ``from getpatter.llm import + * hermes`` ergonomics. + * + * Real construction throughout — no mocks. Proves ``new hermes.LLM()`` builds + * the SAME class as the existing ``HermesLLM`` named export, and that the named + * exports still work alongside the namespaces. + */ + +import { describe, expect, it } from 'vitest'; +import { + hermes, + openclaw, + openaiCompatible, + HermesLLM, + OpenClawLLM, + OpenAICompatibleLLM, +} from '../src'; + +describe('[unit] LLM namespace exports', () => { + it('hermes.LLM constructs the same class as the HermesLLM named export', () => { + const fromNamespace = new hermes.LLM(); + expect(fromNamespace).toBeInstanceOf(HermesLLM); + // Same constructor identity — the namespace re-exports the class, not a copy. + expect(hermes.LLM).toBe(HermesLLM); + expect(fromNamespace.model).toBe('hermes-agent'); + }); + + it('openclaw.LLM constructs the same class as the OpenClawLLM named export', () => { + const fromNamespace = new openclaw.LLM({ agent: 'x' }); + expect(fromNamespace).toBeInstanceOf(OpenClawLLM); + expect(openclaw.LLM).toBe(OpenClawLLM); + expect(fromNamespace.model).toBe('openclaw/x'); + }); + + it('openaiCompatible.LLM constructs the same class as the named export', () => { + const fromNamespace = new openaiCompatible.LLM({ + baseUrl: 'http://127.0.0.1:11434/v1', + model: 'llama3.1', + }); + expect(fromNamespace).toBeInstanceOf(OpenAICompatibleLLM); + expect(openaiCompatible.LLM).toBe(OpenAICompatibleLLM); + expect(fromNamespace.model).toBe('llama3.1'); + }); + + it('openclaw.LLM enforces the same agent-id validation as the named export', () => { + expect(() => new openclaw.LLM({ agent: 'a b' })).toThrow(/agent id/i); + }); +}); diff --git a/libraries/typescript/tests/llm-session-key-factory.mocked.test.ts b/libraries/typescript/tests/llm-session-key-factory.mocked.test.ts new file mode 100644 index 0000000..deb2ef9 --- /dev/null +++ b/libraries/typescript/tests/llm-session-key-factory.mocked.test.ts @@ -0,0 +1,286 @@ +/** + * Tests for the per-call session-key factory (Feature #7). + * + * A ``sessionKeyFactory`` derives the memory-scope header value per call from a + * {@link SessionContext} (carrying ``caller`` + its non-reversible + * {@link hashCaller}). The Hermes convenience ``sessionKeyFrom: 'caller_hash'`` + * installs a default factory that scopes durable memory per caller WITHOUT the + * raw number ever reaching the wire. + * + * Factory resolution, the SessionContext construction, and the caller threading + * through the REAL ``LLMLoop`` are all real code. The only mocked surface is + * ``global.fetch`` — used to inspect the request the provider would POST (the + * X-Hermes-Session-Key header value) without touching the network. + */ + +import { describe, expect, it, vi, afterEach } from 'vitest'; +import { + OpenAICompatibleLLMProvider, + hashCaller, +} from '../src/llm/openai-compatible'; +import { LLM as HermesLLM } from '../src/llm/hermes'; +import { LLMLoop } from '../src/llm-loop'; +import type { + LLMChunk, + LLMProvider, + LLMStreamOptions, +} from '../src/llm-loop'; +import type { SessionContext } from '../src/types'; + +const originalFetch = globalThis.fetch; + +afterEach(() => { + globalThis.fetch = originalFetch; + vi.restoreAllMocks(); +}); + +/** Capture the single fetch a provider issues, returning a 200 + empty body. */ +function captureFetch(): { calls: Array<{ url: string; init: RequestInit }> } { + const calls: Array<{ url: string; init: RequestInit }> = []; + globalThis.fetch = vi.fn( + async (url: string | URL | Request, init?: RequestInit) => { + calls.push({ url: String(url), init: init ?? {} }); + return new Response('', { status: 200 }); + }, + ) as unknown as typeof fetch; + return { calls }; +} + +async function inspectHeaders( + provider: { + stream: ( + m: Array>, + t?: unknown, + o?: LLMStreamOptions, + ) => AsyncGenerator; + }, + opts?: LLMStreamOptions, +): Promise> { + const { calls } = captureFetch(); + for await (const _ of provider.stream( + [{ role: 'user', content: 'hi' }], + null, + opts, + )) { + // drain + } + return calls[0].init.headers as Record; +} + +// --------------------------------------------------------------------------- +// hashCaller — stable, non-reversible, never the raw number +// --------------------------------------------------------------------------- + +describe('[unit] hashCaller', () => { + it('is stable across calls and never the raw number', () => { + const number = '+15555550100'; + const h1 = hashCaller(number); + const h2 = hashCaller(number); + expect(h1).toBe(h2); + expect(h1).toHaveLength(16); + expect(h1).toMatch(/^[0-9a-f]{16}$/); + expect(h1).not.toContain(number); + expect(h1).not.toBe(number); + }); + + it('distinguishes different callers and returns undefined for empty input', () => { + expect(hashCaller('+15555550100')).not.toBe(hashCaller('+15555550101')); + expect(hashCaller(undefined)).toBeUndefined(); + expect(hashCaller('')).toBeUndefined(); + }); +}); + +// --------------------------------------------------------------------------- +// Factory precedence on the generic provider — observed on the wire +// --------------------------------------------------------------------------- + +describe('[mocked] sessionKeyFactory on OpenAICompatibleLLMProvider', () => { + it('factory overrides the static sessionKey and sees the caller hash', async () => { + let seen: SessionContext | undefined; + const provider = new OpenAICompatibleLLMProvider({ + baseUrl: 'http://127.0.0.1:9/v1', + model: 'm', + sessionKeyHeader: 'X-Mem', + sessionKey: 'static-key', // must be overridden + sessionKeyFactory: (ctx) => { + seen = ctx; + return `scope-${ctx.callerHash}`; + }, + }); + + const headers = await inspectHeaders(provider, { + callId: 'c1', + caller: '+15555550100', + callee: '+15555550101', + }); + + const expectedHash = hashCaller('+15555550100'); + expect(headers['X-Mem']).toBe(`scope-${expectedHash}`); + // The factory saw the full context; the EMITTED value carries only the hash. + expect(seen?.callId).toBe('c1'); + expect(seen?.caller).toBe('+15555550100'); + expect(seen?.callee).toBe('+15555550101'); + expect(seen?.callerHash).toBe(expectedHash); + }); + + it('factory returning undefined omits the header', async () => { + const provider = new OpenAICompatibleLLMProvider({ + baseUrl: 'http://127.0.0.1:9/v1', + model: 'm', + sessionKeyHeader: 'X-Mem', + sessionKey: 'static-key', + sessionKeyFactory: () => undefined, + }); + + const headers = await inspectHeaders(provider, { + callId: 'c1', + caller: '+15555550100', + }); + expect(headers['X-Mem']).toBeUndefined(); + }); + + it('static sessionKey is used when no factory is configured', async () => { + const provider = new OpenAICompatibleLLMProvider({ + baseUrl: 'http://127.0.0.1:9/v1', + model: 'm', + sessionKeyHeader: 'X-Mem', + sessionKey: 'static-key', + }); + + const headers = await inspectHeaders(provider, { + callId: 'c1', + caller: '+15555550100', + }); + expect(headers['X-Mem']).toBe('static-key'); + }); + + it('factory fires even without a callId (keys off the caller hash alone)', async () => { + const provider = new OpenAICompatibleLLMProvider({ + baseUrl: 'http://127.0.0.1:9/v1', + model: 'm', + sessionKeyHeader: 'X-Mem', + sessionKeyFactory: (ctx) => `caller-${ctx.callerHash}`, + }); + + const headers = await inspectHeaders(provider, { caller: '+15555550100' }); + expect(headers['X-Mem']).toBe(`caller-${hashCaller('+15555550100')}`); + }); +}); + +// --------------------------------------------------------------------------- +// Hermes convenience: sessionKeyFrom: 'caller_hash' +// --------------------------------------------------------------------------- + +describe('[mocked] HermesLLM sessionKeyFrom convenience', () => { + it('caller_hash derives patter-caller- on the wire (raw number absent)', async () => { + const llm = new HermesLLM({ sessionKeyFrom: 'caller_hash' }); + const caller = '+15555550100'; + const headers = await inspectHeaders(llm, { callId: 'hid-1', caller }); + + const expected = `patter-caller-${hashCaller(caller)}`; + expect(headers['X-Hermes-Session-Key']).toBe(expected); + // The raw number is never in the memory-scope header value. + expect(headers['X-Hermes-Session-Key']).not.toContain(caller); + // Per-call session id still flows alongside the memory scope. + expect(headers['X-Hermes-Session-Id']).toBe('patter-call-hid-1'); + }); + + it('omits the memory-scope header when there is no caller', async () => { + const llm = new HermesLLM({ sessionKeyFrom: 'caller_hash' }); + const headers = await inspectHeaders(llm, { callId: 'hid-1' }); + expect(headers['X-Hermes-Session-Key']).toBeUndefined(); + expect(headers['X-Hermes-Session-Id']).toBe('patter-call-hid-1'); + }); + + it('an explicit sessionKeyFactory wins over sessionKeyFrom', async () => { + const llm = new HermesLLM({ + sessionKeyFrom: 'caller_hash', + sessionKeyFactory: () => 'custom-scope', + }); + const headers = await inspectHeaders(llm, { + callId: 'hid-1', + caller: '+15555550100', + }); + expect(headers['X-Hermes-Session-Key']).toBe('custom-scope'); + }); + + it('rejects an unknown sessionKeyFrom at runtime (parity with Python ValueError)', () => { + // A non-TypeScript / dynamic-JS caller can pass an invalid value the literal + // type would reject only at compile time; it must fail loudly, not silently. + expect( + () => new HermesLLM({ sessionKeyFrom: 'bogus' as unknown as 'caller_hash' }), + ).toThrow(/caller_hash/); + }); +}); + +// --------------------------------------------------------------------------- +// Caller threads through the REAL LLMLoop into the provider's stream() +// --------------------------------------------------------------------------- + +/** Records the stream options it received and yields a single text chunk. */ +class RecordingProvider implements LLMProvider { + static readonly providerKey = 'recording'; + public lastOpts: LLMStreamOptions | undefined; + + async *stream( + _messages: Array>, + _tools?: Array> | null, + opts?: LLMStreamOptions, + ): AsyncGenerator { + this.lastOpts = opts; + yield { type: 'text', content: 'ok' }; + } +} + +async function drain( + gen: AsyncGenerator, +): Promise { + let out = ''; + for await (const tok of gen) out += tok; + return out; +} + +describe('[unit] LLMLoop threads caller/callee into stream opts', () => { + it('forwards caller and callee from call_context into provider.stream opts', async () => { + const provider = new RecordingProvider(); + const loop = new LLMLoop('', 'm', 'be helpful', null, provider); + + const out = await drain( + loop.run('hi', [], { + call_id: 'xyz', + caller: '+15555550100', + callee: '+15555550101', + }), + ); + + expect(out).toBe('ok'); + expect(provider.lastOpts?.callId).toBe('xyz'); + expect(provider.lastOpts?.caller).toBe('+15555550100'); + expect(provider.lastOpts?.callee).toBe('+15555550101'); + }); + + it('leaves opts untouched when no call context is present', async () => { + const provider = new RecordingProvider(); + const loop = new LLMLoop('', 'm', 'be helpful', null, provider); + + await drain(loop.run('hi', [], {})); + + expect(provider.lastOpts?.callId).toBeUndefined(); + expect(provider.lastOpts?.caller).toBeUndefined(); + expect(provider.lastOpts?.callee).toBeUndefined(); + }); + + it('end-to-end: a Hermes caller_hash key reaches the wire via the real loop', async () => { + const llm = new HermesLLM({ sessionKeyFrom: 'caller_hash' }); + const loop = new LLMLoop('', 'hermes-agent', 'be helpful', null, llm); + + const { calls } = captureFetch(); + const caller = '+15555550100'; + await drain(loop.run('hi', [], { call_id: 'c9', caller, callee: '+15555550101' })); + + const headers = calls[0].init.headers as Record; + expect(headers['X-Hermes-Session-Key']).toBe( + `patter-caller-${hashCaller(caller)}`, + ); + }); +}); diff --git a/libraries/typescript/tests/long-turn-filler.mocked.test.ts b/libraries/typescript/tests/long-turn-filler.mocked.test.ts new file mode 100644 index 0000000..2729413 --- /dev/null +++ b/libraries/typescript/tests/long-turn-filler.mocked.test.ts @@ -0,0 +1,341 @@ +/** + * [mocked] Pipeline-mode opt-in long-turn filler (longTurnMessage, Feature #8). + * + * Exercises the REAL pipeline turn path: + * STT final → processTranscript → runPipelineLlm → real LLMLoop.run → + * provider.stream() is SLOW (agent runtime running tools) → the scheduled + * long-turn filler fires after ``longTurnMessageAfterS`` → spoken via the same + * per-sentence TTS primitive (synthesizeSentence) every normal sentence uses. + * + * AUTHENTIC: the StreamHandler, the real ``LLMLoop`` (constructed inside + * ``initPipeline`` from ``agent.llm``), the sentence chunker, the filler + * scheduling / cancellation, and the TTS-send path are REAL. The ONLY mocked + * surfaces are the two external boundaries: + * 1. The LLM provider's ``stream()`` — its TIMING / the paid gateway hop — + * stubbed to delay (or not) before yielding text. + * 2. The TTS byte stream (ElevenLabsTTS ``synthesizeStream``) — replaced with + * PCM Buffers so audio-out is observable. + * Everything inward runs unmodified. + */ + +import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest'; +import { StreamHandler } from '../src/stream-handler'; +import type { TelephonyBridge, StreamHandlerDeps } from '../src/stream-handler'; +import { MetricsStore } from '../src/dashboard/store'; +import { RemoteMessageHandler } from '../src/remote-message'; +import type { AgentOptions } from '../src/types'; +import type { LLMProvider, LLMChunk, LLMStreamOptions } from '../src/llm-loop'; +import type { WebSocket as WSWebSocket } from 'ws'; + +const FILLER = 'One moment while I look that up.'; + +vi.mock('../src/providers/elevenlabs-tts', async (importOriginal) => { + const original = + await importOriginal(); + return { + ...original, + ElevenLabsTTS: vi.fn().mockImplementation(() => ({ + synthesizeStream: vi.fn(async function* () { + yield Buffer.from('tts-audio'); + }), + })), + }; +}); + +vi.mock('../src/dashboard/persistence', () => ({ + notifyDashboard: vi.fn(), +})); + +import { ElevenLabsTTS } from '../src/providers/elevenlabs-tts'; + +function makeMockWs(): WSWebSocket { + return { + send: vi.fn(), + close: vi.fn(), + on: vi.fn(), + once: vi.fn(), + readyState: 1, + removeListener: vi.fn(), + addEventListener: vi.fn(), + removeEventListener: vi.fn(), + } as unknown as WSWebSocket; +} + +function makeMockStt() { + let transcriptCb: + | ((t: { isFinal?: boolean; text?: string }) => Promise) + | undefined; + return { + connect: vi.fn().mockResolvedValue(undefined), + close: vi.fn(), + sendAudio: vi.fn(), + onTranscript: vi.fn( + (cb: (t: { isFinal?: boolean; text?: string }) => Promise) => { + transcriptCb = cb; + }, + ), + get requestId() { + return 'stt-filler-req'; + }, + emitTranscript(text: string): Promise | undefined { + return transcriptCb?.({ isFinal: true, text }); + }, + }; +} + +function makeTwilioBridge( + mockStt: ReturnType, +): TelephonyBridge { + return { + label: 'Twilio', + telephonyProvider: 'twilio', + sendAudio: vi.fn(), + sendMark: vi.fn(), + sendClear: vi.fn(), + transferCall: vi.fn().mockResolvedValue(undefined), + endCall: vi.fn().mockResolvedValue(undefined), + createStt: vi.fn().mockReturnValue(mockStt), + queryTelephonyCost: vi.fn().mockResolvedValue(undefined), + } as unknown as TelephonyBridge; +} + +/** + * A real ``LLMProvider`` whose ``stream()`` timing is the only mocked surface. + * - ``delayMs``: wait this long before yielding the (single, complete) reply + * sentence. A delay past ``longTurnMessageAfterS`` lets the filler fire; a + * zero delay starts real audio before the filler timer. + */ +function makeSlowProvider(delayMs: number, reply = 'Here is your answer. '): LLMProvider { + return { + model: 'agent-runtime-1', + async *stream( + _messages: Array>, + _tools?: Array> | null, + _opts?: LLMStreamOptions, + ): AsyncGenerator { + if (delayMs > 0) { + await new Promise((r) => setTimeout(r, delayMs)); + } + yield { type: 'text', content: reply }; + }, + } as unknown as LLMProvider; +} + +function setupTtsMock(): { calls: string[] } { + const calls: string[] = []; + const MockTTS = ElevenLabsTTS as unknown as ReturnType; + MockTTS.mockImplementation(() => ({ + synthesizeStream: vi.fn(async function* (text: string) { + calls.push(text); + yield Buffer.from('pcm-chunk-1'); + yield Buffer.from('pcm-chunk-2'); + }), + })); + return { calls }; +} + +function makeDeps( + bridge: TelephonyBridge, + agentOverrides: Partial, +): StreamHandlerDeps { + const mockTts = new (ElevenLabsTTS as unknown as new ( + key: string, + voice?: string, + ) => { synthesizeStream: (t: string) => AsyncIterable })( + 'el-key', + 'rachel', + ); + const agent: AgentOptions = { + systemPrompt: 'You are a test pipeline agent.', + provider: 'pipeline', + tts: mockTts as unknown as AgentOptions['tts'], + ...agentOverrides, + } as AgentOptions; + return { + config: {}, + agent, + bridge, + metricsStore: new MetricsStore(), + pricing: null, + remoteHandler: new RemoteMessageHandler(), + recording: false, + buildAIAdapter: vi.fn(), + sanitizeVariables: vi.fn((raw: Record) => { + const safe: Record = {}; + for (const [k, v] of Object.entries(raw)) safe[k] = String(v); + return safe; + }), + resolveVariables: vi.fn((tpl: string) => tpl), + } as unknown as StreamHandlerDeps; +} + +describe('[mocked] pipeline long-turn filler (longTurnMessage)', () => { + beforeEach(() => { + vi.spyOn(globalThis, 'fetch').mockResolvedValue({ + ok: true, + status: 200, + json: async () => ({}), + text: async () => '', + } as Response); + const MockTTS = ElevenLabsTTS as unknown as ReturnType; + MockTTS.mockClear(); + MockTTS.mockImplementation(() => ({ + synthesizeStream: vi.fn(async function* () { + yield Buffer.from('tts-audio'); + }), + })); + }); + + afterEach(() => { + vi.restoreAllMocks(); + }); + + it('speaks the filler when the turn is slow, then the real reply', async () => { + const stt = makeMockStt(); + const bridge = makeTwilioBridge(stt); + const { calls: ttsCalls } = setupTtsMock(); + + // Provider takes 120 ms before its first word; the filler fires at ~20 ms. + const deps = makeDeps(bridge, { + llm: makeSlowProvider(120) as unknown as AgentOptions['llm'], + longTurnMessage: FILLER, + longTurnMessageAfterS: 0.02, + }); + const handler = new StreamHandler( + deps, + makeMockWs(), + '+15551111111', + '+15552222222', + ); + + await handler.handleCallStart('CA-filler-slow'); + await stt.emitTranscript('What is the weather?'); + + await vi.waitFor(() => expect(ttsCalls).toContain(FILLER), { + timeout: 5000, + }); + // The filler was spoken FIRST, then the real reply — exactly one filler. + expect(ttsCalls.indexOf(FILLER)).toBe(0); + expect(ttsCalls).toContain('Here is your answer.'); + expect(ttsCalls.filter((t) => t === FILLER)).toHaveLength(1); + expect( + (bridge.sendAudio as ReturnType).mock.calls.length, + ).toBeGreaterThanOrEqual(1); + }, 10000); + + it('does NOT speak the filler when real audio starts quickly (no double-speak)', async () => { + const stt = makeMockStt(); + const bridge = makeTwilioBridge(stt); + const { calls: ttsCalls } = setupTtsMock(); + + // Reply is immediate; the filler timer (500 ms) must be cleared before firing. + const deps = makeDeps(bridge, { + llm: makeSlowProvider(0) as unknown as AgentOptions['llm'], + longTurnMessage: FILLER, + longTurnMessageAfterS: 0.5, + }); + const handler = new StreamHandler( + deps, + makeMockWs(), + '+15551111111', + '+15552222222', + ); + + await handler.handleCallStart('CA-filler-fast'); + await stt.emitTranscript('What is the weather?'); + + await vi.waitFor(() => expect(ttsCalls).toContain('Here is your answer.'), { + timeout: 5000, + }); + // Give the (cleared) timer ample chance to (not) fire. + await new Promise((r) => setTimeout(r, 600)); + + expect(ttsCalls).not.toContain(FILLER); + }, 10000); + + it('speaks NOTHING extra when longTurnMessage is unset (feature OFF)', async () => { + const stt = makeMockStt(); + const bridge = makeTwilioBridge(stt); + const { calls: ttsCalls } = setupTtsMock(); + + // Slow turn but no longTurnMessage → no filler, behaviour unchanged. + const deps = makeDeps(bridge, { + llm: makeSlowProvider(80) as unknown as AgentOptions['llm'], + }); + const handler = new StreamHandler( + deps, + makeMockWs(), + '+15551111111', + '+15552222222', + ); + + await handler.handleCallStart('CA-filler-off'); + await stt.emitTranscript('What is the weather?'); + + await vi.waitFor(() => expect(ttsCalls).toContain('Here is your answer.'), { + timeout: 5000, + }); + await new Promise((r) => setTimeout(r, 150)); + + // Only the real reply — the filler never appears. + expect(ttsCalls).toEqual(['Here is your answer.']); + }, 10000); + + it('authenticity: stubbing synthesizeSentence to throw makes the filler emit no audio', async () => { + const stt = makeMockStt(); + const bridge = makeTwilioBridge(stt); + setupTtsMock(); + + const deps = makeDeps(bridge, { + llm: makeSlowProvider(120) as unknown as AgentOptions['llm'], + longTurnMessage: FILLER, + longTurnMessageAfterS: 0.02, + }); + const handler = new StreamHandler( + deps, + makeMockWs(), + '+15551111111', + '+15552222222', + ); + + // Break the REAL speak primitive only for the filler line. Its scheduled + // call is wrapped in a .catch(), so the turn must not crash, but no filler + // PCM can reach the carrier — proving the positive test exercised the real + // synthesizeSentence rather than a mock. + const realSynth = ( + handler as unknown as { + synthesizeSentence: (...args: unknown[]) => Promise; + } + ).synthesizeSentence.bind(handler); + const synthSpy = vi + .spyOn( + handler as unknown as { + synthesizeSentence: (...args: unknown[]) => Promise; + }, + 'synthesizeSentence', + ) + .mockImplementation(async (...args: unknown[]) => { + if (args[0] === FILLER) { + throw new Error('synthesizeSentence disabled for filler'); + } + return realSynth(...args); + }); + + await handler.handleCallStart('CA-filler-authentic'); + await stt.emitTranscript('What is the weather?'); + + // The real reply still plays; the broken filler degraded to silence. + await vi.waitFor(() => expect(synthSpy).toHaveBeenCalledWith( + FILLER, + expect.anything(), + expect.anything(), + expect.anything(), + ), { timeout: 5000 }); + await vi.waitFor(() => expect(synthSpy).toHaveBeenCalledWith( + 'Here is your answer.', + expect.anything(), + expect.anything(), + expect.anything(), + ), { timeout: 5000 }); + }, 10000); +}); From 526d8feb6f40be5080e9cdb1d0630b3dc9db4eb7 Mon Sep 17 00:00:00 2001 From: nicolotognoni Date: Sat, 6 Jun 2026 16:02:30 +0200 Subject: [PATCH 02/11] =?UTF-8?q?fix(pipeline):=20multi-turn=20turn-taking?= =?UTF-8?q?=20=E2=80=94=20tail-grace=20next-turn=20rescue=20+=20per-turn?= =?UTF-8?q?=20cancel-event=20reset?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Live-call bug: in pipeline mode (Twilio + Deepgram STT + ElevenLabs TTS + an agent-LLM provider) the FIRST turn worked end-to-end but every subsequent turn went silent, leaving a ghost metrics turn of user_text='' agent_text='[interrupted]'. Two root causes in the pipeline turn-taking state machine, full Python/TypeScript parity: 1. Tail-grace misclassified the next turn as a barge-in. After the agent finishes TTS, _end_speaking_with_grace / endSpeakingWithGrace keeps _is_speaking=true for PATTER_TTS_TAIL_GRACE_MS (default 1500 ms) to swallow the fading echo tail. Humans reply in 200-700 ms — inside that window — so the user's next utterance was detected as a barge-in: it recorded an interrupted turn and the leading audio was withheld from STT (only a <=260 ms echo-contaminated ring), so no final transcript was produced and the agent never answered. New _tail_grace_active / tailGraceActive flag distinguishes "actively streaming TTS" from "post-TTS echo guard". A VAD speech_start OR a transcript during the tail grace now ends the grace and dispatches as a clean NEW turn via _end_tail_grace_for_new_turn / endTailGraceForNewTurn — recovering the leading audio from the ring instead of dropping it, with no spurious send_clear / record_turn_interrupted. Real barge-in during active TTS (tail_grace_active=false) is unchanged. The Python grace flip task is now tracked and cancelled (parity with TS clearGraceTimer) so at most one is in flight. 2. (Python) A barge-in's per-turn cancel event leaked into the next turn. _llm_cancel_event was recreated inside _process_streaming_response — AFTER LLMLoop.run had already captured the previous (still-set) event for the next turn — so the turn after any real barge-in bailed immediately. The reset moved to the top of _dispatch_turn, before dispatch; the event object is now stable through a turn (generator and consumption loop share it). TypeScript already allocates a fresh AbortController per turn in runPipelineLlm. Tests: new test_pipeline_multiturn_tail_grace.py (6) + pipeline-multiturn-tail-grace.mocked.test.ts (4) reproduce the bug and assert the rescue, the flag lifecycle, the active-TTS barge-in regression guard, and the fresh cancel-event. Python 2212 / TypeScript 1763 pass; tsc + build clean. Adversarial review: 0 critical / 0 high. --- CHANGELOG.md | 6 + libraries/python/getpatter/stream_handler.py | 130 +++++++++- .../test_pipeline_multiturn_tail_grace.py | 244 ++++++++++++++++++ libraries/typescript/src/stream-handler.ts | 82 ++++++ ...peline-multiturn-tail-grace.mocked.test.ts | 187 ++++++++++++++ 5 files changed, 642 insertions(+), 7 deletions(-) create mode 100644 libraries/python/tests/unit/test_pipeline_multiturn_tail_grace.py create mode 100644 libraries/typescript/tests/pipeline-multiturn-tail-grace.mocked.test.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index b727355..6baed1d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,12 @@ - **`session_key_factory` / `sessionKeyFactory` — per-call long-term memory scope from a caller hash.** `OpenAICompatibleLLM` (and `HermesLLM`) can derive the `X-Hermes-Session-Key` header per call from a `SessionContext` (`call_id` / `caller` / `callee` / `caller_hash`) instead of a static value, so an agent runtime can remember a caller across calls **without the raw phone number ever reaching the wire or the logs**. Shortcut `HermesLLM(session_key_from="caller_hash")` installs a default `patter-caller-` factory (SHA-256, 16 hex chars). New public `SessionContext` + `hash_caller` / `hashCaller` helper. The factory takes precedence over the static `session_key`; a falsy return omits the header. The loop dispatch was generalised to thread `caller` / `callee` only to providers whose `stream()` declares them (or `**kwargs`), keeping built-in and minimal custom providers unchanged. `libraries/python/getpatter/models.py`, `.../llm/openai_compatible.py`, `.../llm/hermes.py`, `.../services/llm_loop.py` + TypeScript mirrors. - **`long_turn_message` / `longTurnMessage` — opt-in spoken filler during a slow turn.** When an LLM turn takes longer than `long_turn_message_after_s` (default 4 s) and no audio has reached the caller yet, Patter speaks a short configurable line (e.g. "One moment, let me check.") instead of dead silence — useful for agent runtimes (Hermes / OpenClaw) that run tools mid-turn. Distinct from `llm_error_message` (which fires on error): this fires on **slowness**, once per turn, gated on emitted audio so it never double-speaks. `None` / unset = off (no behaviour change). `libraries/python/getpatter/models.py`, `.../stream_handler.py`, `.../client.py` + TypeScript mirrors. +### Fixed + +- **Multi-turn pipeline conversations no longer go silent after the first turn.** The agent answered the first turn but then ignored every subsequent utterance, leaving a ghost metrics turn of `user_text='' agent_text='[interrupted]'`. Two root causes in the pipeline turn-taking state machine: + - **Tail-grace misclassified the next turn as a barge-in.** After the agent finishes speaking, `_end_speaking_with_grace` keeps `_is_speaking=true` for `PATTER_TTS_TAIL_GRACE_MS` (default 1500 ms) to swallow the fading TTS echo tail. Humans reply in 200-700 ms — inside that window — so the user's next utterance was treated as a barge-in: it recorded an interrupted turn and the leading audio was withheld from STT (only a ≤260 ms echo-contaminated ring), so no final transcript was produced and the agent never answered. A new `_tail_grace_active` / `tailGraceActive` flag now distinguishes "actively streaming TTS" from "post-TTS echo guard"; a VAD `speech_start` (or a transcript) during the tail grace ends the grace and is dispatched as a clean new turn — recovering the leading audio from the ring instead of dropping it — with no spurious `record_turn_interrupted`. Tunable `PATTER_TTS_TAIL_GRACE_MS` (0 / 200 / 1500) is now safe for fast next-turn speech. + - **(Python) A barge-in's per-turn cancel event leaked into the next turn.** `_llm_cancel_event` was only recreated *inside* `_process_streaming_response` — after `LLMLoop.run` had already been handed the (still-set) event for the next turn — so the turn following any real barge-in bailed immediately. The event is now recreated at the top of `_dispatch_turn`, before dispatch (TypeScript already allocated a fresh `AbortController` per turn). `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`. + ## 0.6.5 (2026-06-05) ### Added diff --git a/libraries/python/getpatter/stream_handler.py b/libraries/python/getpatter/stream_handler.py index 7967e54..beb368f 100644 --- a/libraries/python/getpatter/stream_handler.py +++ b/libraries/python/getpatter/stream_handler.py @@ -2348,6 +2348,20 @@ def __init__( self._auto_vad = None self._stt_task: asyncio.Task | None = None self._is_speaking = False + # True only while the post-TTS tail-grace window is pending: the + # agent has finished its turn but ``_is_speaking`` is still held for + # ``PATTER_TTS_TAIL_GRACE_MS`` to swallow the fading echo tail. A VAD + # ``speech_start`` (or a transcript) during this window is the user's + # NEXT turn, not a barge-in — there is nothing left to interrupt. Set + # by ``_end_speaking_with_grace``; cleared by ``_begin_speaking``, the + # grace flip, barge-in cancels, and ``_end_tail_grace_for_new_turn``. + self._tail_grace_active = False + # Handle to the scheduled grace-flip task so it can be cancelled + # (parity with TS ``clearGraceTimer``) — at most one pending at a + # time. The ``_speaking_generation`` guard already makes a stale flip + # a no-op; cancelling avoids leaving an idle ``asyncio.sleep`` task + # per turn on long, fast-turn calls. + self._grace_task: asyncio.Task | None = None # Per-turn LLM cancel event. Recreated on every new turn before LLM # consumption so a stale cancel from a previous turn cannot terminate # the next stream prematurely. Initialized here so the STT loop's @@ -3195,12 +3209,14 @@ async def _process_streaming_response(self, result, call_id: str) -> str: first_tts_chunk, hook_executor, hook_ctx ) - # Reset the per-turn LLM cancel event so a stale cancel from a - # previous turn cannot terminate this stream prematurely. The - # event is *set* by ``_handle_barge_in`` to break out of the - # consumption loop and close the generator (which propagates - # cancellation into the LLM provider's HTTP/WS connection). - self._llm_cancel_event = asyncio.Event() + # NOTE: the per-turn ``_llm_cancel_event`` is reset at the TOP of + # ``_dispatch_turn`` (before ``LLMLoop.run`` is handed the event), not + # here. Recreating it at this point — after ``run`` already captured + # the previous reference — used to leave the generator bound to a + # different event object than the consumption loop reads, and left a + # barge-in's set event leaking into the next turn. The event is *set* + # by ``_handle_barge_in`` to break out of the loop below and close the + # generator (propagating cancellation into the provider connection). interrupted = False llm_error = False @@ -3426,6 +3442,17 @@ async def _handle_barge_in(self, transcript) -> None: """ if not (transcript.text and self._is_speaking): return + # Defensive ``getattr`` — test fixtures build the handler via + # ``object.__new__`` and skip ``__init__`` (no tail-grace state). + if getattr(self, "_tail_grace_active", False): + # A transcript arriving during the post-TTS tail grace is the + # next turn, not a barge-in (the agent already finished). End the + # grace and return WITHOUT cancelling — the same transcript then + # flows on to ``_commit_transcript``/``_dispatch_turn`` as a + # normal new turn. Closes the race where a transcript lands + # before the VAD speech_start rescue fires. + await self._end_tail_grace_for_new_turn() + return if not self._can_barge_in(): aec_state = "on" if getattr(self, "_aec", None) is not None else "off" logger.info( @@ -3487,6 +3514,7 @@ async def _do_cancel_for_barge_in(self, transcript_text: str) -> None: {"patter.call.id": self.call_id}, ): self._is_speaking = False + self._tail_grace_active = False self._speaking_started_at = None self._first_audio_sent_at = None self._last_cancel_at = time.time() @@ -3671,6 +3699,18 @@ async def _dispatch_turn(self, transcript_text: str) -> None: """Run the post-commit pipeline (record STT → afterTranscribe → LLM dispatch → TTS → turn-complete) inline on the STT loop. """ + # Reset the per-turn LLM cancel event BEFORE dispatch so a stale + # cancel set by a previous turn's barge-in (``_do_cancel_for_barge_in`` + # calls ``cancel_event.set()``) cannot terminate this turn's LLM + # stream the instant it starts. This must happen before + # ``self._llm_loop.run(..., cancel_event=self._llm_cancel_event)`` is + # handed the event — recreating it later (inside + # ``_process_streaming_response``) was too late: ``run`` had already + # captured the set event, so the next turn after any barge-in went + # silent. Parity with TS, which allocates a fresh ``AbortController`` + # per turn in ``runPipelineLlm``. + self._llm_cancel_event = asyncio.Event() + # Record one STT span per final transcript turn. The span is # short-lived (just the attribute set) because STT is # streaming — we do not re-wrap the long-lived iterator. @@ -3927,6 +3967,20 @@ async def on_audio_received(self, audio_bytes: bytes) -> None: vad_event = None if vad_event is not None: if vad_event.type == "speech_start": + # Tail-grace new-turn rescue: the agent already finished + # its turn and we are only in the post-TTS echo-guard + # window. A VAD speech_start here is the user's next turn, + # not a barge-in — end the grace synchronously so this + # utterance flows to STT as a clean new turn instead of + # being swallowed by the self-hearing guard or mislabelled + # as an empty ``[interrupted]`` turn (the multi-turn + # silence bug). After this ``_is_speaking`` is False, so + # the if/elif below is a no-op and the frame falls through + # to STT. Parity with TS ``endTailGraceForNewTurn``. + if self._is_speaking and getattr( + self, "_tail_grace_active", False + ): + await self._end_tail_grace_for_new_turn() phantom_suppressed = self._is_speaking and not self._can_barge_in() if phantom_suppressed: # Within the per-turn warmup gate. With AEC on @@ -4107,6 +4161,10 @@ async def _begin_speaking(self, is_first_message: bool = False) -> None: await asyncio.sleep(remaining) self._speaking_generation += 1 self._is_speaking = True + # A fresh turn is actively streaming — not in the post-TTS echo + # window. Clear the tail-grace flag so a VAD speech_start during this + # turn is treated as a real barge-in (not a new-turn rescue). + self._tail_grace_active = False self._speaking_started_at = time.time() # Stamp ``_first_audio_sent_at`` synchronously for EVERY turn so the # ``_can_barge_in()`` gate (250 ms anti-flicker for PSTN no-AEC) runs @@ -4207,6 +4265,7 @@ async def _end_speaking_with_grace(self) -> None: # ``_begin_speaking()``. if grace_ms <= 0: self._is_speaking = False + self._tail_grace_active = False self._speaking_started_at = None self._first_audio_sent_at = None self._clear_pending_barge_in() @@ -4218,6 +4277,14 @@ async def _end_speaking_with_grace(self) -> None: return gen = self._speaking_generation + # The agent has finished pushing audio; we now hold ``_is_speaking`` + # only to suppress the fading echo tail. Mark this as the tail-grace + # window so fast next-turn speech is rescued as a new turn rather + # than mis-detected as a barge-in. + self._tail_grace_active = True + # Cancel any still-pending flip from a previous turn so at most one + # grace task is ever in flight (parity with TS ``clearGraceTimer``). + self._clear_grace_task() async def _flip_after_grace() -> None: try: @@ -4226,6 +4293,7 @@ async def _flip_after_grace() -> None: # newer turn would have bumped ``_speaking_generation``. if self._speaking_generation == gen: self._is_speaking = False + self._tail_grace_active = False self._speaking_started_at = None self._first_audio_sent_at = None self._clear_pending_barge_in() @@ -4242,7 +4310,52 @@ async def _flip_after_grace() -> None: except Exception as exc: # pragma: no cover - defensive logger.debug("tts grace flip failed: %s", exc) - asyncio.create_task(_flip_after_grace()) + self._grace_task = asyncio.create_task(_flip_after_grace()) + + def _clear_grace_task(self) -> None: + """Cancel the pending grace-flip task, if any. Idempotent; safe from + test fixtures built via ``object.__new__`` (no ``__init__``).""" + task = getattr(self, "_grace_task", None) + if task is not None and not task.done(): + task.cancel() + self._grace_task = None + + async def _end_tail_grace_for_new_turn(self) -> None: + """End the post-TTS tail-grace window because the user has begun + their next turn. + + Unlike a barge-in, the agent's response already played out in full — + there is nothing to cancel and no turn was interrupted. We flip the + speaking flag off (bumping ``_speaking_generation`` so the scheduled + grace-flip task no-ops), recover any leading audio the self-hearing + guard captured into the ring (the user's first ~250 ms, which VAD + needed before it could emit ``speech_start``), and let the live STT + stream take over. Crucially we do NOT call ``send_clear``, + ``record_bargein_detected`` or ``record_turn_interrupted`` — none of + those apply to a turn that completed normally. + + Without this, fast next-turn speech (humans reply in 200-700 ms, well + inside the 1500 ms default grace) is withheld from STT and recorded + as an empty ``[interrupted]`` turn, after which the agent goes silent + for the rest of the call. + """ + self._is_speaking = False + self._tail_grace_active = False + self._speaking_started_at = None + self._first_audio_sent_at = None + # Invalidate the pending grace-flip task scheduled by + # ``_end_speaking_with_grace`` so it cannot later flip state on a turn + # that has already moved on (bump the generation AND cancel the task — + # parity with TS ``clearGraceTimer``). + self._speaking_generation += 1 + self._clear_grace_task() + self._clear_pending_barge_in() + await self._reset_barge_in_strategies() + # Recover the user's leading words. Same rationale as the barge-in + # flush — but here it is the only audio recovery, since the agent + # already stopped and no new TTS will overwrite it. + self._suppressed_speech_pending = False + await self._flush_inbound_audio_ring() async def _reset_barge_in_strategies(self) -> None: if not self._barge_in_strategies: @@ -4544,6 +4657,9 @@ async def cleanup(self) -> None: # spurious overlap_end events. Idempotent: safe to call when no # pending state exists. self._clear_pending_barge_in() + # Cancel any pending tail-grace flip task so it does not sleep past + # teardown and touch a finalised handler. + self._clear_grace_task() # Resolve every pending firstMessage mark future before tearing # down adapters. Without this, a call that ends abnormally mid # firstMessage (carrier WS drop, hangup during the paced sender) diff --git a/libraries/python/tests/unit/test_pipeline_multiturn_tail_grace.py b/libraries/python/tests/unit/test_pipeline_multiturn_tail_grace.py new file mode 100644 index 0000000..912713d --- /dev/null +++ b/libraries/python/tests/unit/test_pipeline_multiturn_tail_grace.py @@ -0,0 +1,244 @@ +"""Multi-turn regression tests for the pipeline turn-taking state machine. + +Reproduces the live-call failure where the *first* turn works end-to-end but +every *subsequent* turn goes silent, leaving a ghost metrics turn of +``user_text='' agent_text='[interrupted]'``. + +Root causes covered here: + +1. **Tail-grace misclassification.** After the agent finishes a turn, + ``_end_speaking_with_grace`` keeps ``_is_speaking=True`` for + ``PATTER_TTS_TAIL_GRACE_MS`` (default 1500 ms) to swallow the fading TTS + echo tail. Humans reply in 200-700 ms — well inside that window — so the + user's next utterance was being mis-detected as a *barge-in*: + ``record_turn_interrupted`` fired (the ``[interrupted]`` ghost) and the + leading audio was withheld from STT (only a <=260 ms echo-contaminated + ring), so no final transcript was produced and the agent never answered. + The fix treats a VAD ``speech_start`` (or a transcript) during the tail + grace as the start of a NEW turn, not a barge-in. + +2. **Stale ``_llm_cancel_event``.** A real barge-in sets the per-turn cancel + event; it was only recreated *inside* ``_process_streaming_response`` — + AFTER ``LLMLoop.run`` had already been handed the (now set) event for the + next turn. The next turn's LLM stream then bailed immediately. The fix + recreates the event at the top of ``_dispatch_turn``, before dispatch. + +Only the external boundary is mocked (STT/TTS/audio sender). The VAD is a +scripted in-process double so the on_audio_received path runs unmocked. +""" + +from __future__ import annotations + +import asyncio +import os +import time +from collections import deque +from typing import AsyncIterator +from unittest.mock import AsyncMock, MagicMock + +import pytest + +from getpatter.providers.base import VADEvent +from getpatter.stream_handler import PipelineStreamHandler + +from tests.conftest import make_agent + + +# --------------------------------------------------------------------------- +# Scripted in-process VAD — emits a caller-supplied event per frame +# --------------------------------------------------------------------------- + + +class _ScriptedVAD: + """Returns the next queued VADEvent (or None) on each ``process_frame``.""" + + def __init__(self, events: list[VADEvent | None]) -> None: + self._events = list(events) + self.reset_calls = 0 + + async def process_frame(self, pcm: bytes, sample_rate: int) -> VADEvent | None: + if self._events: + return self._events.pop(0) + return None + + async def close(self) -> None: # pragma: no cover - not exercised + pass + + def reset(self) -> None: + self.reset_calls += 1 + + +def _make_pipeline_handler(*, metrics: MagicMock | None = None) -> PipelineStreamHandler: + audio_sender = AsyncMock() + handler = PipelineStreamHandler( + agent=make_agent(), + audio_sender=audio_sender, + call_id="call-multiturn", + caller="+15551110000", + callee="+15552220000", + resolved_prompt="p", + metrics=metrics, + for_twilio=True, + on_transcript=None, + conversation_history=deque(maxlen=20), + transcript_entries=deque(maxlen=20), + ) + handler.on_message = None + handler._llm_loop = None + handler._stt = AsyncMock() + handler._aec = None + # Treat inbound as already-PCM16 16 kHz so on_audio_received skips the + # mulaw decode path (the scripted VAD ignores the bytes anyway). + handler._input_is_mulaw_8k = False + return handler + + +def _enter_tail_grace(handler: PipelineStreamHandler) -> None: + """Put the handler into the post-TTS tail-grace window: the agent has + finished speaking but ``_is_speaking`` is still held for echo suppression. + ``_first_audio_sent_at`` is stamped in the past so ``_can_barge_in`` is + True (the warmup gate elapsed) — exactly the state that produced the bug. + """ + handler._is_speaking = True + handler._tail_grace_active = True + handler._speaking_generation = 1 + handler._speaking_started_at = time.time() - 2.0 + handler._first_audio_sent_at = time.time() - 2.0 + handler._inbound_audio_ring = [] + + +_FRAME = b"\x00\x01" * 160 # arbitrary PCM16 bytes; scripted VAD ignores content + + +@pytest.mark.unit +@pytest.mark.asyncio +class TestTailGraceNewTurn: + """Speech during the tail grace is a new turn, not a barge-in.""" + + async def test_speech_during_tail_grace_reaches_stt_without_interrupt(self) -> None: + metrics = MagicMock() + handler = _make_pipeline_handler(metrics=metrics) + handler._auto_vad = _ScriptedVAD( + [None, None, VADEvent(type="speech_start"), None] + ) + _enter_tail_grace(handler) + + # Two leading frames while still in tail grace → buffered to ring, + # NOT yet forwarded to STT. + await handler.on_audio_received(_FRAME) + await handler.on_audio_received(_FRAME) + assert handler._stt.send_audio.await_count == 0 + assert len(handler._inbound_audio_ring) == 2 + + # VAD speech_start fires → tail grace ends as a NEW TURN. + await handler.on_audio_received(_FRAME) + + # Not a barge-in: the agent already finished, nothing was interrupted. + metrics.record_bargein_detected.assert_not_called() + handler.audio_sender.send_clear.assert_not_awaited() + assert handler._is_speaking is False + assert handler._tail_grace_active is False + + # Leading audio recovered (ring flushed) + the trigger frame sent live. + assert handler._stt.send_audio.await_count >= 3 + + # A following frame now streams straight through to STT. + await handler.on_audio_received(_FRAME) + assert handler._stt.send_audio.await_count >= 4 + + async def test_active_tts_speech_still_barges_in(self) -> None: + """Regression guard: speech during *active* TTS (not tail grace) must + still trigger a real barge-in.""" + metrics = MagicMock() + handler = _make_pipeline_handler(metrics=metrics) + handler._auto_vad = _ScriptedVAD([VADEvent(type="speech_start")]) + # Active TTS: speaking, but NOT in the post-completion tail grace. + handler._is_speaking = True + handler._tail_grace_active = False + handler._speaking_generation = 1 + handler._speaking_started_at = time.time() - 2.0 + handler._first_audio_sent_at = time.time() - 2.0 + handler._inbound_audio_ring = [] + + await handler.on_audio_received(_FRAME) + + metrics.record_bargein_detected.assert_called_once() + handler.audio_sender.send_clear.assert_awaited_once() + assert handler._is_speaking is False + + +@pytest.mark.unit +@pytest.mark.asyncio +class TestTailGraceFlagLifecycle: + """``_tail_grace_active`` tracks the post-TTS grace window precisely.""" + + async def test_begin_speaking_clears_flag(self) -> None: + handler = _make_pipeline_handler() + handler._tail_grace_active = True + await handler._begin_speaking() + assert handler._is_speaking is True + assert handler._tail_grace_active is False + + async def test_grace_sets_then_clears_flag(self, monkeypatch) -> None: + monkeypatch.setenv("PATTER_TTS_TAIL_GRACE_MS", "20") + handler = _make_pipeline_handler() + await handler._begin_speaking() + handler._first_audio_sent_at = time.time() - 1.0 + + await handler._end_speaking_with_grace() + # Grace pending: still "speaking" but flagged as tail grace. + assert handler._is_speaking is True + assert handler._tail_grace_active is True + + await asyncio.sleep(0.06) # > 20 ms grace + assert handler._is_speaking is False + assert handler._tail_grace_active is False + + async def test_zero_grace_does_not_enter_tail_grace(self, monkeypatch) -> None: + monkeypatch.setenv("PATTER_TTS_TAIL_GRACE_MS", "0") + handler = _make_pipeline_handler() + await handler._begin_speaking() + await handler._end_speaking_with_grace() + assert handler._is_speaking is False + assert handler._tail_grace_active is False + + +@pytest.mark.unit +@pytest.mark.asyncio +class TestLlmCancelEventReset: + """A barge-in's cancel event must not leak into the next turn's dispatch.""" + + async def test_dispatch_uses_fresh_cancel_event(self) -> None: + handler = _make_pipeline_handler() + + captured: dict = {} + + async def _fake_stream(): + yield "hello " + + class _FakeLoop: + def run(self, text, history, ctx, *, cancel_event=None, **kwargs): + # Record whether the event handed to the LLM was already set + # (i.e. leaked from a previous turn's barge-in). + captured["was_set"] = bool(cancel_event and cancel_event.is_set()) + return _fake_stream() + + handler._llm_loop = _FakeLoop() + + # Avoid the real TTS speak path: stub the response processor. + async def _fake_process(result, call_id): + # Drain the generator so the loop's run() actually executed. + async for _ in result: + pass + return "hello" + + handler._process_streaming_response = _fake_process # type: ignore[assignment] + handler._emit_assistant_transcript = AsyncMock() + + # Simulate a stale cancel left set by a previous turn's barge-in. + handler._llm_cancel_event.set() + assert handler._llm_cancel_event.is_set() is True + + await handler._dispatch_turn("dimmi che ore sono") + + assert captured.get("was_set") is False diff --git a/libraries/typescript/src/stream-handler.ts b/libraries/typescript/src/stream-handler.ts index 342676c..fefd056 100644 --- a/libraries/typescript/src/stream-handler.ts +++ b/libraries/typescript/src/stream-handler.ts @@ -417,6 +417,17 @@ export class StreamHandler { private stt: STTAdapter | null = null; private tts: TTSAdapter | null = null; private isSpeaking = false; + /** + * True only while the post-TTS tail-grace window is pending: the agent has + * finished its turn but ``isSpeaking`` is still held for + * ``PATTER_TTS_TAIL_GRACE_MS`` to swallow the fading echo tail. A VAD + * ``speech_start`` (or a transcript) during this window is the user's NEXT + * turn, not a barge-in — there is nothing left to interrupt. Set by + * ``endSpeakingWithGrace``; cleared by ``beginSpeaking``, the grace flip, + * ``cancelSpeaking``, and ``endTailGraceForNewTurn``. Parity with Python + * ``_tail_grace_active``. + */ + private tailGraceActive = false; /** * Ring buffer of inbound PCM16 16 kHz frames captured while the agent * is speaking and the self-hearing guard is dropping audio. On @@ -615,6 +626,10 @@ export class StreamHandler { } this.speakingGeneration++; this.isSpeaking = true; + // A fresh turn is actively streaming — not in the post-TTS echo window. + // Clear the tail-grace flag so a VAD speech_start during this turn is + // treated as a real barge-in (not a new-turn rescue). + this.tailGraceActive = false; this.speakingStartedAt = Date.now(); this.suppressedSpeechPending = false; // Stamp ``firstAudioSentAt`` synchronously for EVERY turn so the @@ -670,6 +685,7 @@ export class StreamHandler { private cancelSpeaking(): void { this.speakingGeneration++; // invalidates pending grace timers this.isSpeaking = false; + this.tailGraceActive = false; this.speakingStartedAt = null; this.firstAudioSentAt = null; this.lastCancelAt = Date.now(); @@ -782,10 +798,16 @@ export class StreamHandler { if (grace > 0) { const gen = this.speakingGeneration; this.clearGraceTimer(); + // The agent has finished pushing audio; ``isSpeaking`` is now held only + // to suppress the fading echo tail. Mark the tail-grace window so fast + // next-turn speech is rescued as a new turn rather than mis-detected as + // a barge-in. + this.tailGraceActive = true; this.graceTimer = setTimeout(() => { this.graceTimer = null; if (this.speakingGeneration === gen) { this.isSpeaking = false; + this.tailGraceActive = false; this.speakingStartedAt = null; this.firstAudioSentAt = null; this.clearPendingBargeIn(); @@ -806,6 +828,7 @@ export class StreamHandler { }, grace); } else { this.isSpeaking = false; + this.tailGraceActive = false; this.speakingStartedAt = null; this.firstAudioSentAt = null; this.clearPendingBargeIn(); @@ -818,6 +841,38 @@ export class StreamHandler { } } + /** + * End the post-TTS tail-grace window because the user has begun their next + * turn. Unlike a barge-in, the agent's response already played out in full + * — there is nothing to cancel and no turn was interrupted. We flip the + * speaking flag off (bumping ``speakingGeneration`` so the scheduled grace + * timer no-ops), recover any leading audio the self-hearing guard captured + * into the ring (the user's first ~250 ms, which VAD needed before it could + * emit ``speech_start``), and let the live STT stream take over. We do NOT + * call ``sendClear``, ``recordBargeinDetected`` or ``recordTurnInterrupted`` + * — none apply to a turn that completed normally. + * + * Without this, fast next-turn speech (humans reply in 200-700 ms, well + * inside the 1500 ms default grace) is withheld from STT and recorded as an + * empty ``[interrupted]`` turn, after which the agent goes silent for the + * rest of the call. Parity with Python ``_end_tail_grace_for_new_turn``. + */ + private endTailGraceForNewTurn(): void { + this.isSpeaking = false; + this.tailGraceActive = false; + this.speakingStartedAt = null; + this.firstAudioSentAt = null; + this.speakingGeneration++; // invalidates the pending grace timer + this.clearGraceTimer(); + this.clearPendingBargeIn(); + void this.resetBargeInStrategies(); + // Recover the user's leading words. Same rationale as the barge-in flush + // — but here it is the only audio recovery, since the agent already + // stopped and no new TTS will overwrite it. + this.suppressedSpeechPending = false; + this.flushInboundAudioRing(); + } + private async resetBargeInStrategies(): Promise { if (this.bargeInStrategies.length === 0) return; const { resetStrategies } = await import('./services/barge-in-strategies.js'); @@ -1427,6 +1482,18 @@ export class StreamHandler { ); } if (evt?.type === 'speech_start') { + // Tail-grace new-turn rescue: the agent already finished its turn + // and we are only in the post-TTS echo-guard window. A VAD + // speech_start here is the user's next turn, not a barge-in — end + // the grace so this utterance flows to STT as a clean new turn + // instead of being swallowed by the self-hearing guard or + // mislabelled as an empty ``[interrupted]`` turn (the multi-turn + // silence bug). After this ``isSpeaking`` is false, so the + // if/else below is a no-op and the frame falls through to STT. + // Parity with Python ``_end_tail_grace_for_new_turn``. + if (this.isSpeaking && this.tailGraceActive) { + this.endTailGraceForNewTurn(); + } const phantomSuppressed = this.isSpeaking && !this.canBargeIn(); if (phantomSuppressed) { // Within the per-turn warmup gate. With AEC on this is the @@ -2518,6 +2585,15 @@ export class StreamHandler { isFinal?: boolean; }): Promise { if (!transcript.text || !this.isSpeaking) return false; + if (this.tailGraceActive) { + // A transcript during the post-TTS tail grace is the next turn, not a + // barge-in (the agent already finished). End the grace and return + // WITHOUT cancelling — the same transcript then flows on to dispatch as + // a normal new turn. Closes the race where a transcript lands before + // the VAD speech_start rescue fires. + this.endTailGraceForNewTurn(); + return false; + } if (!this.canBargeIn()) { getLogger().info( `Barge-in transcript suppressed (agent speaking < gate, aec=${this.aec ? 'on' : 'off'})`, @@ -2560,6 +2636,12 @@ export class StreamHandler { */ private handleBargeIn(transcript: { text?: string; isFinal?: boolean }): boolean { if (!transcript.text || !this.isSpeaking) return false; + if (this.tailGraceActive) { + // Tail-grace transcript = next turn, not a barge-in. End the grace and + // let the transcript dispatch normally (parity with the async path). + this.endTailGraceForNewTurn(); + return false; + } if (this.bargeInStrategies.length === 0) { // Legacy synchronous path — preserve exact byte-for-byte behaviour // for users who haven't opted into the confirm pipeline. diff --git a/libraries/typescript/tests/pipeline-multiturn-tail-grace.mocked.test.ts b/libraries/typescript/tests/pipeline-multiturn-tail-grace.mocked.test.ts new file mode 100644 index 0000000..0a13846 --- /dev/null +++ b/libraries/typescript/tests/pipeline-multiturn-tail-grace.mocked.test.ts @@ -0,0 +1,187 @@ +/** + * [mocked] Multi-turn turn-taking — tail-grace new-turn rescue (parity with + * Python ``test_pipeline_multiturn_tail_grace.py``). + * + * Reproduces the live-call failure where the FIRST turn works but every + * SUBSEQUENT turn goes silent with a ghost ``[interrupted]`` metrics turn. + * + * Root cause: after the agent finishes, ``endSpeakingWithGrace`` keeps + * ``isSpeaking=true`` for ``PATTER_TTS_TAIL_GRACE_MS`` (default 1500 ms) to + * swallow the fading echo tail. Humans reply in 200-700 ms — inside that + * window — so the user's next utterance was mis-detected as a barge-in + * (``recordTurnInterrupted`` + leading audio withheld from STT), so no + * transcript was produced and the agent never answered. The fix treats a VAD + * ``speech_start`` (or a transcript) during the tail grace as a NEW turn. + * + * AUTHENTIC: the real StreamHandler + CallMetricsAccumulator. Mocked only at + * the external boundary (telephony bridge, STT adapter). Private state/methods + * are exercised via casts — the same surface the audio/STT loops drive. + */ +import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest'; +import { StreamHandler } from '../src/stream-handler'; +import type { TelephonyBridge, StreamHandlerDeps } from '../src/stream-handler'; +import { MetricsStore } from '../src/dashboard/store'; +import { RemoteMessageHandler } from '../src/remote-message'; +import type { WebSocket as WSWebSocket } from 'ws'; +import type { AgentOptions } from '../src/types'; + +function makeMockWs(): WSWebSocket { + return { + send: vi.fn(), + close: vi.fn(), + on: vi.fn(), + once: vi.fn(), + readyState: 1, + removeListener: vi.fn(), + addEventListener: vi.fn(), + removeEventListener: vi.fn(), + } as unknown as WSWebSocket; +} + +function makeBridge(): TelephonyBridge { + return { + label: 'Twilio', + telephonyProvider: 'twilio', + sendAudio: vi.fn(), + sendMark: vi.fn(), + sendClear: vi.fn(), + transferCall: vi.fn().mockResolvedValue(undefined), + endCall: vi.fn().mockResolvedValue(undefined), + createStt: vi.fn().mockReturnValue(null), + queryTelephonyCost: vi.fn().mockResolvedValue(undefined), + } as unknown as TelephonyBridge; +} + +function makeDeps(bridge: TelephonyBridge, store: MetricsStore): StreamHandlerDeps { + const agent: AgentOptions = { + systemPrompt: 'You are a helpful test agent.', + provider: 'pipeline', + model: 'gpt-4o-mini', + voice: 'alloy', + }; + return { + config: {}, + agent, + bridge, + metricsStore: store, + pricing: null, + remoteHandler: new RemoteMessageHandler(), + recording: false, + buildAIAdapter: vi.fn().mockReturnValue(null), + sanitizeVariables: vi.fn((raw: Record) => raw), + resolveVariables: vi.fn((tpl: string) => tpl), + } as unknown as StreamHandlerDeps; +} + +function makeHandler(): StreamHandler { + const handler = new StreamHandler( + makeDeps(makeBridge(), new MetricsStore()), + makeMockWs(), + '+15551110000', + '+15552220000', + ); + handler.setStreamSid('MZ-multiturn'); + return handler; +} + +const FRAME = Buffer.from([0, 1, 0, 1, 0, 1]); + +describe('[mocked] pipeline multi-turn tail-grace rescue', () => { + afterEach(() => { + vi.useRealTimers(); + }); + + it('endSpeakingWithGrace flags the tail grace, the grace timer clears it', () => { + vi.useFakeTimers(); + const h = makeHandler() as unknown as { + isSpeaking: boolean; + tailGraceActive: boolean; + endSpeakingWithGrace(): void; + }; + h.isSpeaking = true; + h.endSpeakingWithGrace(); + // Grace pending: still "speaking" but flagged as tail grace. + expect(h.isSpeaking).toBe(true); + expect(h.tailGraceActive).toBe(true); + + vi.advanceTimersByTime(1600); // > 1500 ms default grace + expect(h.isSpeaking).toBe(false); + expect(h.tailGraceActive).toBe(false); + }); + + it('beginSpeaking clears the tail-grace flag', async () => { + const h = makeHandler(); + const priv = h as unknown as { tailGraceActive: boolean; isSpeaking: boolean }; + priv.tailGraceActive = true; + await (h as unknown as { beginSpeaking(): Promise }).beginSpeaking(); + expect(priv.isSpeaking).toBe(true); + expect(priv.tailGraceActive).toBe(false); + }); + + it('a transcript during the tail grace is a new turn, not a barge-in', () => { + const bridge = makeBridge(); + const handler = new StreamHandler( + makeDeps(bridge, new MetricsStore()), + makeMockWs(), + '+15551110000', + '+15552220000', + ); + const sttSendAudio = vi.fn(); + const priv = handler as unknown as { + isSpeaking: boolean; + tailGraceActive: boolean; + speakingStartedAt: number | null; + firstAudioSentAt: number | null; + inboundAudioRing: Buffer[]; + stt: unknown; + handleBargeIn(t: { text?: string; isFinal?: boolean }): boolean; + }; + priv.stt = { sendAudio: sttSendAudio, finalize: vi.fn() }; + priv.isSpeaking = true; + priv.tailGraceActive = true; + priv.speakingStartedAt = Date.now() - 2000; + priv.firstAudioSentAt = Date.now() - 2000; + priv.inboundAudioRing = [FRAME, FRAME]; + + const interrupted = priv.handleBargeIn({ text: 'che ore sono', isFinal: true }); + + expect(interrupted).toBe(false); + expect(priv.isSpeaking).toBe(false); + expect(priv.tailGraceActive).toBe(false); + // Not a barge-in: nothing was cleared on the carrier. + expect(bridge.sendClear).not.toHaveBeenCalled(); + // Leading audio recovered: the ring was flushed to STT. + expect(sttSendAudio).toHaveBeenCalledTimes(2); + }); + + it('a transcript during ACTIVE TTS still triggers a real barge-in', () => { + const bridge = makeBridge(); + const handler = new StreamHandler( + makeDeps(bridge, new MetricsStore()), + makeMockWs(), + '+15551110000', + '+15552220000', + ); + const priv = handler as unknown as { + isSpeaking: boolean; + tailGraceActive: boolean; + speakingStartedAt: number | null; + firstAudioSentAt: number | null; + inboundAudioRing: Buffer[]; + stt: unknown; + handleBargeIn(t: { text?: string; isFinal?: boolean }): boolean; + }; + priv.stt = { sendAudio: vi.fn(), finalize: vi.fn() }; + priv.isSpeaking = true; + priv.tailGraceActive = false; // active TTS, NOT the post-completion grace + priv.speakingStartedAt = Date.now() - 2000; + priv.firstAudioSentAt = Date.now() - 2000; + priv.inboundAudioRing = []; + + const interrupted = priv.handleBargeIn({ text: 'actually wait', isFinal: true }); + + expect(interrupted).toBe(true); + expect(priv.isSpeaking).toBe(false); + expect(bridge.sendClear).toHaveBeenCalled(); + }); +}); From c34a7db4a6961bd1d118d659cb1ba99c31d6d6a8 Mon Sep 17 00:00:00 2001 From: nicolotognoni Date: Sat, 6 Jun 2026 19:55:55 +0200 Subject: [PATCH 03/11] =?UTF-8?q?fix(pipeline):=20barge-in=20during=20a=20?= =?UTF-8?q?turn=20=E2=80=94=20decoupled=20dispatch=20+=20pre-first-token?= =?UTF-8?q?=20abort=20(Hermes/OpenClaw)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The caller could not interrupt the agent mid-response. The STT receive loop awaited the turn's LLM+TTS dispatch inline (`await self._dispatch_turn(...)` / `await this.runPipelineLlm(...)`), so during a long (30-90 s) Hermes/OpenClaw tool-running turn it stopped reading transcripts — a barge-in transcript ("ferma") was only processed AFTER the turn ended. On PSTN with echo-masked/unreliable VAD, the transcript path is the only barge-in fallback and it was structurally dead. Three coordinated changes, full Python/TypeScript parity: 1. Decoupled single-in-flight dispatch. The turn runs as one tracked background task (_dispatch_task / dispatchTask) so the receive loop keeps draining transcripts and runs handleBargeIn against the LIVE turn. The loop settles the previous dispatch before launching the next (single-in-flight), so conversation_history / metrics ordering is unchanged; the loop still awaits the final turn to settle before returning, so existing tests that inspect state right after the loop are unaffected. 2. Prompt pre-first-token abort (Python). Agent runtimes run tools for tens of seconds before the first token, during which the per-chunk cancel_event check never runs. The provider now races create()+first-byte against the cancel signal and spawns a watchdog that close()s the response the instant a barge-in fires (TS already aborts promptly via fetch + AbortController). The VAD legacy barge-in branch now also sets _llm_cancel_event (previously it only flipped _is_speaking, which Hermes never observed pre-first-token), and the OpenAI-compatible client uses an explicit httpx read/connect timeout so a dead gateway fails fast. 3. PATTER_FORWARD_STT_WHILE_SPEAKING (opt-in, default off). Forwards inbound audio to STT during TTS even with a VAD configured, so the transcript barge-in path can receive a transcript on echo-masked links where the VAD never fires. The leading-edge ring is still captured. Echo caveat (WARN on enable): without AEC the agent's own voice may be transcribed as a phantom interruption — pair with agent.barge_in_strategies. Default behaviour (flag off, VAD present, normal LLM) is byte-identical; the just-landed tail-grace multi-turn fix is preserved. Tests: new test_pipeline_bargein_backgrounded.py (4), test_provider_prefirsttoken_abort.py (3), pipeline-bargein-backgrounded.mocked.test.ts (2). Python 2219 / TypeScript 1765 pass; tsc + build clean. --- CHANGELOG.md | 4 + .../python/getpatter/llm/openai_compatible.py | 132 +++++++---- .../python/getpatter/services/llm_loop.py | 179 ++++++++++----- libraries/python/getpatter/stream_handler.py | 106 ++++++++- .../tests/test_llm_hermes_openclaw_presets.py | 6 +- .../tests/test_llm_openai_compatible.py | 5 +- .../test_pipeline_bargein_backgrounded.py | 206 ++++++++++++++++++ .../unit/test_provider_prefirsttoken_abort.py | 152 +++++++++++++ libraries/typescript/src/stream-handler.ts | 195 +++++++++++------ .../tests/long-turn-filler.mocked.test.ts | 6 + ...peline-bargein-backgrounded.mocked.test.ts | 187 ++++++++++++++++ 11 files changed, 1009 insertions(+), 169 deletions(-) create mode 100644 libraries/python/tests/unit/test_pipeline_bargein_backgrounded.py create mode 100644 libraries/python/tests/unit/test_provider_prefirsttoken_abort.py create mode 100644 libraries/typescript/tests/pipeline-bargein-backgrounded.mocked.test.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index 6baed1d..a98937b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,6 +11,10 @@ - **Multi-turn pipeline conversations no longer go silent after the first turn.** The agent answered the first turn but then ignored every subsequent utterance, leaving a ghost metrics turn of `user_text='' agent_text='[interrupted]'`. Two root causes in the pipeline turn-taking state machine: - **Tail-grace misclassified the next turn as a barge-in.** After the agent finishes speaking, `_end_speaking_with_grace` keeps `_is_speaking=true` for `PATTER_TTS_TAIL_GRACE_MS` (default 1500 ms) to swallow the fading TTS echo tail. Humans reply in 200-700 ms — inside that window — so the user's next utterance was treated as a barge-in: it recorded an interrupted turn and the leading audio was withheld from STT (only a ≤260 ms echo-contaminated ring), so no final transcript was produced and the agent never answered. A new `_tail_grace_active` / `tailGraceActive` flag now distinguishes "actively streaming TTS" from "post-TTS echo guard"; a VAD `speech_start` (or a transcript) during the tail grace ends the grace and is dispatched as a clean new turn — recovering the leading audio from the ring instead of dropping it — with no spurious `record_turn_interrupted`. Tunable `PATTER_TTS_TAIL_GRACE_MS` (0 / 200 / 1500) is now safe for fast next-turn speech. - **(Python) A barge-in's per-turn cancel event leaked into the next turn.** `_llm_cancel_event` was only recreated *inside* `_process_streaming_response` — after `LLMLoop.run` had already been handed the (still-set) event for the next turn — so the turn following any real barge-in bailed immediately. The event is now recreated at the top of `_dispatch_turn`, before dispatch (TypeScript already allocated a fresh `AbortController` per turn). `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`. +- **Pipeline barge-in now works DURING a turn — including long Hermes/OpenClaw tool-running turns.** The caller could not interrupt the agent mid-response: the STT receive loop awaited the turn's LLM+TTS dispatch inline (`await self._dispatch_turn(...)` / `await this.runPipelineLlm(...)`), so for the whole 30-90 s of a tool-running agent-runtime turn it stopped reading transcripts — a barge-in transcript was only processed *after* the turn ended ("ferma" → answered late). Three coordinated changes, full Python/TS parity: + - **Decoupled, single-in-flight dispatch.** The turn now runs as one tracked background task (`_dispatch_task` / `dispatchTask`) so the receive loop keeps draining transcripts and runs barge-in detection against the LIVE turn. Exactly one dispatch is in flight: the loop settles the previous one before launching the next, so `conversation_history` / metrics ordering is unchanged. With no barge-in (default, VAD present, normal LLM) behaviour is unchanged — the loop still awaits the final turn to settle before returning. + - **Prompt pre-first-token abort (Python).** Agent runtimes run tools for tens of seconds before the first token, during which the per-chunk `cancel_event` check never runs. The provider now races `create()` + first-byte against the cancel signal and spawns a watchdog that `close()`s the response the instant a barge-in fires, so the request is torn down immediately instead of blocking the next turn (TS already aborts promptly via `fetch` + `AbortController`). The VAD legacy barge-in branch now also sets `_llm_cancel_event` (it previously only flipped `_is_speaking`), and the OpenAI-compatible client uses an explicit httpx read/connect timeout so a dead gateway fails fast. + - **`PATTER_FORWARD_STT_WHILE_SPEAKING` (opt-in, default off).** Forwards inbound audio to STT during TTS even with a VAD configured, so the transcript barge-in path can receive a transcript on echo-masked PSTN links where the VAD never fires. The leading-edge ring buffer is still captured. **Echo caveat:** without AEC the agent's own voice may be transcribed as a phantom interruption — pair with `agent.barge_in_strategies`. `libraries/python/getpatter/stream_handler.py`, `.../services/llm_loop.py`, `.../llm/openai_compatible.py`, `libraries/typescript/src/stream-handler.ts`. ## 0.6.5 (2026-06-05) diff --git a/libraries/python/getpatter/llm/openai_compatible.py b/libraries/python/getpatter/llm/openai_compatible.py index 0825c30..902ba83 100644 --- a/libraries/python/getpatter/llm/openai_compatible.py +++ b/libraries/python/getpatter/llm/openai_compatible.py @@ -155,10 +155,17 @@ def __init__( super().__init__(api_key=key or _EMPTY_KEY_SENTINEL, model=model, **kwargs) default_headers = {"User-Agent": self._user_agent, **(extra_headers or {})} + # Bound a connect that never lands (gateway down) to ~10 s while keeping + # the long read budget for tool-running turns. A scalar timeout would + # apply the full (120 s) ceiling to connect too, so a dead Hermes would + # hang the first turn for the full budget instead of failing fast. + import httpx as _httpx + + _client_timeout: Any = _httpx.Timeout(timeout, connect=min(timeout, 10.0)) self._client: Any = AsyncOpenAI( api_key=key or _EMPTY_KEY_SENTINEL, base_url=base_url, - timeout=timeout, + timeout=_client_timeout, default_headers=default_headers, ) @@ -304,58 +311,87 @@ async def stream( kwargs = self._build_completion_kwargs( messages, tools, call_id=call_id, caller=caller, callee=callee ) - response = await self._client.chat.completions.create(**kwargs) + # Agent runtimes run tools for tens of seconds before the first token; + # race create()/first-byte against the cancel signal + spawn a close() + # watchdog so a barge-in during that pre-first-token window aborts the + # request promptly instead of blocking the next turn (see base + # ``OpenAILLMProvider._open_stream_with_cancel`` / ``_abort_on_cancel``). + response = await self._open_stream_with_cancel(kwargs, cancel_event) + if response is None: + return + abort_watcher = ( + asyncio.ensure_future(self._abort_on_cancel(cancel_event, response)) + if cancel_event is not None + else None + ) last_usage = None - async for chunk in response: + try: + async for chunk in response: + if cancel_event is not None and cancel_event.is_set(): + try: + await response.close() + except Exception: # noqa: BLE001 - best-effort cleanup + pass + return + usage = getattr(chunk, "usage", None) + if usage is not None: + last_usage = usage + + delta = chunk.choices[0].delta if chunk.choices else None + if delta is None: + continue + + if delta.content: + yield {"type": "text", "content": delta.content} + + if delta.tool_calls: + for tc in delta.tool_calls: + yield { + "type": "tool_call", + "index": tc.index, + "id": tc.id, + "name": tc.function.name if tc.function else None, + "arguments": tc.function.arguments if tc.function else None, + } + + if last_usage is not None: + cache_read = 0 + details = getattr(last_usage, "prompt_tokens_details", None) + if details is not None: + cache_read = getattr(details, "cached_tokens", 0) or 0 + # Mirror OpenAILLMProvider.stream exactly: prompt_tokens is the + # TOTAL input (uncached + cached); subtract cached so input_tokens + # is the uncached portion and cost isn't double-billed. + prompt_tokens = getattr(last_usage, "prompt_tokens", 0) or 0 + uncached_input = max(0, prompt_tokens - cache_read) + completion_tokens = getattr(last_usage, "completion_tokens", 0) or 0 + self._record_completion_cost( + prompt_tokens=prompt_tokens, + completion_tokens=completion_tokens, + ) + yield { + "type": "usage", + "input_tokens": uncached_input, + "output_tokens": completion_tokens, + "cache_read_tokens": cache_read, + } + except asyncio.CancelledError: + raise + except Exception: + # close()-induced read error during a barge-in is a clean stop, not + # an LLM error — swallow when cancelling so llm_error_message does + # not fire. Genuine upstream errors (cancel not set) propagate. if cancel_event is not None and cancel_event.is_set(): + return + raise + finally: + if abort_watcher is not None: + abort_watcher.cancel() try: - await response.close() - except Exception: # noqa: BLE001 - best-effort cleanup + await abort_watcher + except (asyncio.CancelledError, Exception): pass - return - usage = getattr(chunk, "usage", None) - if usage is not None: - last_usage = usage - - delta = chunk.choices[0].delta if chunk.choices else None - if delta is None: - continue - - if delta.content: - yield {"type": "text", "content": delta.content} - - if delta.tool_calls: - for tc in delta.tool_calls: - yield { - "type": "tool_call", - "index": tc.index, - "id": tc.id, - "name": tc.function.name if tc.function else None, - "arguments": tc.function.arguments if tc.function else None, - } - - if last_usage is not None: - cache_read = 0 - details = getattr(last_usage, "prompt_tokens_details", None) - if details is not None: - cache_read = getattr(details, "cached_tokens", 0) or 0 - # Mirror OpenAILLMProvider.stream exactly: prompt_tokens is the - # TOTAL input (uncached + cached); subtract cached so input_tokens - # is the uncached portion and cost isn't double-billed. - prompt_tokens = getattr(last_usage, "prompt_tokens", 0) or 0 - uncached_input = max(0, prompt_tokens - cache_read) - completion_tokens = getattr(last_usage, "completion_tokens", 0) or 0 - self._record_completion_cost( - prompt_tokens=prompt_tokens, - completion_tokens=completion_tokens, - ) - yield { - "type": "usage", - "input_tokens": uncached_input, - "output_tokens": completion_tokens, - "cache_read_tokens": cache_read, - } class LLM(OpenAICompatibleLLMProvider): diff --git a/libraries/python/getpatter/services/llm_loop.py b/libraries/python/getpatter/services/llm_loop.py index fb4c9f4..c5bd6ec 100644 --- a/libraries/python/getpatter/services/llm_loop.py +++ b/libraries/python/getpatter/services/llm_loop.py @@ -662,6 +662,55 @@ def _build_completion_kwargs( kwargs["max_completion_tokens"] = self._max_tokens return kwargs + async def _open_stream_with_cancel(self, kwargs: dict, cancel_event): + """Create the streaming completion, aborting promptly if ``cancel_event`` + fires while awaiting — INCLUDING before the first SSE byte. + + Agent runtimes (Hermes / OpenClaw) run tools/memory/skills for tens of + seconds before the first token; the ``create()`` await (and the first + ``__anext__``) would otherwise be unabortable, so a barge-in during + that window could not free the connection and the next user turn would + block behind it. Races ``create()`` against the cancel signal and + cancels the in-flight POST if the user interrupts first. + + Returns the streaming response, or ``None`` if cancelled before the + response object existed. + """ + if cancel_event is None: + return await self._client.chat.completions.create(**kwargs) + create_task = asyncio.ensure_future( + self._client.chat.completions.create(**kwargs) + ) + cancel_task = asyncio.ensure_future(cancel_event.wait()) + try: + await asyncio.wait( + {create_task, cancel_task}, return_when=asyncio.FIRST_COMPLETED + ) + finally: + cancel_task.cancel() + if not create_task.done(): + # The caller interrupted before the upstream even responded — + # abort the in-flight POST so the socket is freed immediately. + create_task.cancel() + try: + await create_task + except BaseException: # noqa: BLE001 - aborting in-flight request + pass + return None + return create_task.result() + + @staticmethod + async def _abort_on_cancel(cancel_event, response) -> None: + """Close the streaming response the instant ``cancel_event`` fires so a + consumer parked on the first SSE byte unblocks immediately instead of + waiting out the read timeout. Best-effort; cancelled in the stream's + ``finally`` once the turn ends normally.""" + try: + await cancel_event.wait() + await response.close() + except Exception: # noqa: BLE001 - best-effort teardown + pass + async def stream( self, messages: list[dict], @@ -694,64 +743,88 @@ async def stream( it into ``_build_completion_kwargs``. """ kwargs = self._build_completion_kwargs(messages, tools) - response = await self._client.chat.completions.create(**kwargs) + response = await self._open_stream_with_cancel(kwargs, cancel_event) + if response is None: + # Cancelled before the first byte (barge-in during the agent + # runtime's pre-first-token tool window) — nothing to yield. + return + abort_watcher = ( + asyncio.ensure_future(self._abort_on_cancel(cancel_event, response)) + if cancel_event is not None + else None + ) last_usage = None - async for chunk in response: + try: + async for chunk in response: + if cancel_event is not None and cancel_event.is_set(): + try: + await response.close() + except Exception: # noqa: BLE001 - best-effort cleanup + pass + return + # Usage chunks have empty ``choices`` and a populated ``usage``. + usage = getattr(chunk, "usage", None) + if usage is not None: + last_usage = usage + + delta = chunk.choices[0].delta if chunk.choices else None + if delta is None: + continue + + if delta.content: + yield {"type": "text", "content": delta.content} + + if delta.tool_calls: + for tc in delta.tool_calls: + yield { + "type": "tool_call", + "index": tc.index, + "id": tc.id, + "name": tc.function.name if tc.function else None, + "arguments": tc.function.arguments if tc.function else None, + } + + if last_usage is not None: + cache_read = 0 + details = getattr(last_usage, "prompt_tokens_details", None) + if details is not None: + cache_read = getattr(details, "cached_tokens", 0) or 0 + # OpenAI's prompt_tokens is the TOTAL input (uncached + cached). + # Subtract cached so input_tokens represents only the uncached + # portion and calculate_llm_cost doesn't bill cached tokens at + # the full input rate (mirrors libraries/typescript/src/llm-loop.ts:296-305). + prompt_tokens = getattr(last_usage, "prompt_tokens", 0) or 0 + uncached_input = max(0, prompt_tokens - cache_read) + completion_tokens = getattr(last_usage, "completion_tokens", 0) or 0 + self._record_completion_cost( + prompt_tokens=prompt_tokens, + completion_tokens=completion_tokens, + ) + yield { + "type": "usage", + "input_tokens": uncached_input, + "output_tokens": completion_tokens, + "cache_read_tokens": cache_read, + } + except asyncio.CancelledError: + raise + except Exception: + # A read error AFTER the cancel watchdog closed the response is the + # expected clean stop on a (possibly pre-first-token) barge-in — + # swallow it so it is not surfaced as an LLM error (which would + # trip the spoken llm_error_message fallback). Any genuine upstream + # error (cancel_event not set) propagates unchanged. if cancel_event is not None and cancel_event.is_set(): - # Best-effort cancel of the upstream stream so the underlying - # HTTP connection is freed instead of waiting for the server - # to close. ``response.close()`` is sync on AsyncOpenAI and - # may raise if the stream already ended — best-effort. + return + raise + finally: + if abort_watcher is not None: + abort_watcher.cancel() try: - await response.close() - except Exception: # noqa: BLE001 - best-effort cleanup + await abort_watcher + except (asyncio.CancelledError, Exception): pass - return - # Usage chunks have empty ``choices`` and a populated ``usage``. - usage = getattr(chunk, "usage", None) - if usage is not None: - last_usage = usage - - delta = chunk.choices[0].delta if chunk.choices else None - if delta is None: - continue - - if delta.content: - yield {"type": "text", "content": delta.content} - - if delta.tool_calls: - for tc in delta.tool_calls: - yield { - "type": "tool_call", - "index": tc.index, - "id": tc.id, - "name": tc.function.name if tc.function else None, - "arguments": tc.function.arguments if tc.function else None, - } - - if last_usage is not None: - cache_read = 0 - details = getattr(last_usage, "prompt_tokens_details", None) - if details is not None: - cache_read = getattr(details, "cached_tokens", 0) or 0 - # OpenAI's prompt_tokens is the TOTAL input (uncached + cached). - # Subtract cached so input_tokens represents only the uncached - # portion and calculate_llm_cost doesn't bill cached tokens at - # the full input rate (mirrors libraries/typescript/src/llm-loop.ts:296-305). - prompt_tokens = getattr(last_usage, "prompt_tokens", 0) or 0 - uncached_input = max(0, prompt_tokens - cache_read) - completion_tokens = getattr(last_usage, "completion_tokens", 0) or 0 - self._record_completion_cost( - prompt_tokens=prompt_tokens, - completion_tokens=completion_tokens, - ) - yield { - "type": "usage", - "input_tokens": uncached_input, - "output_tokens": completion_tokens, - "cache_read_tokens": cache_read, - } def _record_completion_cost( self, *, prompt_tokens: int, completion_tokens: int diff --git a/libraries/python/getpatter/stream_handler.py b/libraries/python/getpatter/stream_handler.py index beb368f..14a0798 100644 --- a/libraries/python/getpatter/stream_handler.py +++ b/libraries/python/getpatter/stream_handler.py @@ -2347,6 +2347,27 @@ def __init__( # because ``agent`` is a frozen dataclass. self._auto_vad = None self._stt_task: asyncio.Task | None = None + # The in-flight turn dispatch (LLM + TTS) runs as a SINGLE tracked task + # so the STT receive loop keeps draining transcripts during a long + # (30-90 s) agent-runtime turn and can fire transcript-based barge-in + # against the LIVE turn. Exactly one is active at a time — the loop + # awaits the previous one to settle before launching the next, so + # conversation_history / metrics ordering is unchanged. None when idle. + self._dispatch_task: asyncio.Task | None = None + # Opt-in (default OFF): forward inbound audio to STT even while the + # agent is speaking, so the transcript barge-in path can receive a + # transcript on echo-masked PSTN links where the VAD never fires. + # ECHO RISK without AEC — see ``on_audio_received`` self-hearing guard. + self._forward_stt_while_speaking = os.environ.get( + "PATTER_FORWARD_STT_WHILE_SPEAKING", "" + ).strip().lower() in ("1", "true", "yes") + if self._forward_stt_while_speaking: + logger.warning( + "PATTER_FORWARD_STT_WHILE_SPEAKING=on: inbound audio is sent to " + "STT during TTS so transcript barge-in works on echo-masked " + "links. Without AEC the agent's own voice may be transcribed as " + "a phantom interruption — pair with agent.barge_in_strategies." + ) self._is_speaking = False # True only while the post-TTS tail-grace window is pending: the # agent has finished its turn but ``_is_speaking`` is still held for @@ -3690,14 +3711,58 @@ async def _stt_loop(self) -> None: self.metrics.anchor_user_speech_start() continue - await self._dispatch_turn(transcript.text) + # Decouple dispatch from the receive loop: run the turn as a + # SINGLE tracked task so the ``async for`` keeps draining + # transcripts during a long (30-90 s) agent-runtime turn and + # can fire transcript-based barge-in against the LIVE turn — + # the head-of-line-blocking fix. Settle the previous turn + # first so exactly one dispatch is in flight and the per-turn + # conversation_history / metrics ordering is preserved. + await self._await_dispatch_settle() + self._dispatch_task = asyncio.create_task( + self._dispatch_turn(transcript.text) + ) except Exception as exc: logger.exception("Pipeline STT loop error: %s", exc) + finally: + # Return only once the last dispatch fully settles, so callers and + # tests that inspect state right after ``await _stt_loop()`` still + # observe completed turn effects (the loop no longer blocks DURING + # a turn, but it does block until the FINAL turn is done). + await self._await_dispatch_settle() + + async def _await_dispatch_settle(self) -> None: + """Await the in-flight turn dispatch to fully settle. + + Called before launching the next turn (single-in-flight) and once + more when the STT loop exits. Two cases: the prior dispatch either + completed naturally (await is a no-op) or was cancelled by a barge-in + (await lets its ``finally`` — grace flip, LLM span close, ring reset, + history flush — run BEFORE the next turn's ``_begin_speaking``). Always + clears the handle so a backgrounded-task exception is retrieved (no + ``Task exception was never retrieved`` leak). + """ + task = self._dispatch_task + if task is None: + return + try: + await task + except asyncio.CancelledError: # pragma: no cover - teardown path + pass + except Exception as exc: # pragma: no cover - already handled in dispatch + logger.debug("backgrounded dispatch raised: %s", exc) + finally: + # Only clear if it is still the task we awaited — a re-entrant + # launch could have replaced it (it cannot today: the loop is the + # sole launcher and awaits here first, but be defensive). + if self._dispatch_task is task: + self._dispatch_task = None async def _dispatch_turn(self, transcript_text: str) -> None: """Run the post-commit pipeline (record STT → afterTranscribe → - LLM dispatch → TTS → turn-complete) inline on the STT loop. + LLM dispatch → TTS → turn-complete) as a tracked background task so + the STT receive loop keeps draining transcripts during the turn. """ # Reset the per-turn LLM cancel event BEFORE dispatch so a stale # cancel set by a previous turn's barge-in (``_do_cancel_for_barge_in`` @@ -4038,11 +4103,25 @@ async def on_audio_received(self, audio_bytes: bytes) -> None: self.metrics.record_tts_stopped() self.metrics.record_turn_interrupted() self._is_speaking = False + self._tail_grace_active = False self._speaking_started_at = None self._first_audio_sent_at = None self._speaking_generation += 1 self._last_cancel_at = time.time() self._suppressed_speech_pending = False + # Tear down the in-flight LLM stream too. The + # consumption loop polls ``_llm_cancel_event`` + # per chunk, but a turn parked PRE-first-token + # on a hung agent request never sees a chunk — + # the provider cancel watchdog (see + # ``OpenAICompatibleLLMProvider.stream``) closes + # the request the instant this fires. Parity + # with TS ``cancelSpeaking`` → ``llmAbort.abort``. + cancel_event = getattr( + self, "_llm_cancel_event", None + ) + if cancel_event is not None: + cancel_event.set() if not phantom_suppressed and self.metrics is not None: # Industry-standard pattern: every legitimate VAD speech_start # re-anchors the turn timestamp pre-commit. This @@ -4098,7 +4177,16 @@ async def on_audio_received(self, audio_bytes: bytes) -> None: # post-barge-in bleed-transcription entry. if len(self._inbound_audio_ring) > 13: # ~260 ms at 20 ms/frame self._inbound_audio_ring.pop(0) - return + # Opt-in: also forward the frame to STT during TTS so the + # transcript barge-in path can receive a transcript on + # echo-masked links where the VAD never fires. The ring push + # above stays unconditional (leading-edge recovery preserved); + # only the early-return is gated. ECHO RISK without AEC — the + # agent's own voice may be transcribed as a phantom + # interruption; pair with agent.barge_in_strategies. Default + # OFF → byte-identical push-and-return. + if not self._forward_stt_while_speaking: + return # before_send_to_stt hook — gate/transform the audio chunk before it # reaches the STT provider. Returning None drops the chunk (useful @@ -4648,6 +4736,18 @@ async def cleanup(self) -> None: _tts_cancel() except Exception: pass + # Hard-cancel the backgrounded turn dispatch (teardown backstop) so no + # orphan task touches a finalized handler. The cancel_event.set() above + # lets a post-first-token turn break gracefully; the cancel covers a + # turn parked pre-first-token on a hung agent request. + _dispatch_task = getattr(self, "_dispatch_task", None) + if _dispatch_task is not None and not _dispatch_task.done(): + _dispatch_task.cancel() + try: + await _dispatch_task + except (asyncio.CancelledError, Exception): + pass + self._dispatch_task = None # Drop any pending barge-in timeout BEFORE we tear down metrics / # adapters. Without this, a call that ends while a barge-in is # pending leaves an asyncio.Task scheduled to fire diff --git a/libraries/python/tests/test_llm_hermes_openclaw_presets.py b/libraries/python/tests/test_llm_hermes_openclaw_presets.py index c0a2a6a..1363452 100644 --- a/libraries/python/tests/test_llm_hermes_openclaw_presets.py +++ b/libraries/python/tests/test_llm_hermes_openclaw_presets.py @@ -33,7 +33,8 @@ def test_hermes_defaults_base_url_model_timeout(monkeypatch) -> None: llm = hermes.LLM() assert _base_url_str(llm).startswith("http://127.0.0.1:8642/v1") assert llm._model == "hermes-agent" - assert llm._client.timeout == 120.0 + assert llm._client.timeout.read == 120.0 + assert llm._client.timeout.connect == 10.0 # Hermes is stateless and keys continuity off HEADERS: # X-Hermes-Session-Id (per call) + optional X-Hermes-Session-Key (memory). assert llm._session_user_prefix == "patter-call-" @@ -113,7 +114,8 @@ def test_openclaw_defaults_match_consult_preset(monkeypatch) -> None: assert llm._session_user_prefix == "patter-call-" # OpenClaw has no separate memory-scope header. assert llm._session_key_header is None - assert llm._client.timeout == 120.0 + assert llm._client.timeout.read == 120.0 + assert llm._client.timeout.connect == 10.0 assert llm.provider_key == "openclaw" diff --git a/libraries/python/tests/test_llm_openai_compatible.py b/libraries/python/tests/test_llm_openai_compatible.py index ea00b11..b803c5c 100644 --- a/libraries/python/tests/test_llm_openai_compatible.py +++ b/libraries/python/tests/test_llm_openai_compatible.py @@ -27,7 +27,10 @@ def test_openai_compatible_provider_points_client_at_base_url_with_timeout() -> ) # Real client carries the base URL and the long (non-default) timeout. assert _base_url_str(provider).startswith("http://127.0.0.1:9/v1") - assert provider._client.timeout == 120.0 + # Long read budget for tool-running turns; connect bounded (~10 s) so a + # dead gateway fails fast instead of hanging the full read budget. + assert provider._client.timeout.read == 120.0 + assert provider._client.timeout.connect == 10.0 assert provider._model == "m" # Satisfies the LLMProvider protocol. assert isinstance(provider, LLMProvider) diff --git a/libraries/python/tests/unit/test_pipeline_bargein_backgrounded.py b/libraries/python/tests/unit/test_pipeline_bargein_backgrounded.py new file mode 100644 index 0000000..295e200 --- /dev/null +++ b/libraries/python/tests/unit/test_pipeline_bargein_backgrounded.py @@ -0,0 +1,206 @@ +"""Backgrounded-dispatch barge-in: the STT receive loop keeps draining +transcripts during a long agent-runtime turn, so a barge-in (transcript OR +VAD) can interrupt the LIVE turn instead of being processed only after it ends. + +Reproduces the live Hermes failure: while the assistant was speaking, the +caller said "ferma" but Patter only reacted after the turn finished — because +``_stt_loop`` awaited ``_dispatch_turn`` inline and stopped reading transcripts +for the whole (30-90 s) turn. Covers: + +* the decoupled dispatch + transcript barge-in cancelling the in-flight turn; +* the VAD legacy branch now setting ``_llm_cancel_event`` (pre-first-token + teardown parity with TS); +* the opt-in ``PATTER_FORWARD_STT_WHILE_SPEAKING`` guard. + +Only the external boundary (LLM stream timing, TTS bytes, STT) is faked. +""" + +from __future__ import annotations + +import asyncio +import time +from collections import deque +from typing import AsyncIterator +from unittest.mock import AsyncMock, MagicMock + +import pytest + +from getpatter.providers.base import Transcript, VADEvent +from getpatter.stream_handler import PipelineStreamHandler + +from tests.conftest import make_agent + + +class _FakeTTS: + output_format = "pcm_16000" + + def __init__(self) -> None: + self.synthesized: list[str] = [] + + async def synthesize(self, text: str): + self.synthesized.append(text) + yield b"\x00\x00" * 80 + + +class _ScriptedVAD: + def __init__(self, events: list[VADEvent | None]) -> None: + self._events = list(events) + + async def process_frame(self, pcm: bytes, sample_rate: int) -> VADEvent | None: + return self._events.pop(0) if self._events else None + + async def close(self) -> None: # pragma: no cover + pass + + def reset(self) -> None: + pass + + +def _make_handler(*, metrics: MagicMock | None = None) -> PipelineStreamHandler: + handler = PipelineStreamHandler( + agent=make_agent(), + audio_sender=AsyncMock(), + call_id="call-bg", + caller="+15551110000", + callee="+15552220000", + resolved_prompt="p", + metrics=metrics, + for_twilio=True, + on_transcript=None, + conversation_history=deque(maxlen=20), + transcript_entries=deque(maxlen=20), + ) + handler.on_message = None + handler._tts = _FakeTTS() # type: ignore[assignment] + handler._stt = AsyncMock() + handler._aec = None + handler._input_is_mulaw_8k = False + return handler + + +_FRAME = b"\x00\x01" * 160 + + +@pytest.mark.unit +@pytest.mark.asyncio +class TestTranscriptBargeInDuringInFlightTurn: + """A barge-in transcript cancels the LIVE turn — proving the receive loop + is no longer blocked on dispatch.""" + + async def test_bargein_transcript_cancels_inflight_long_turn(self) -> None: + metrics = MagicMock() + handler = _make_handler(metrics=metrics) + # Past the warmup gate so barge-in is allowed. + handler._can_barge_in = lambda: True # type: ignore[assignment] + + cancel_seen = asyncio.Event() + + class _ParkUntilCancelLoop: + def __init__(self) -> None: + self.calls = 0 + + def run(self, text, history, ctx, *, cancel_event=None, **kw): + self.calls += 1 + first = self.calls == 1 + + async def _gen(): + if first: + # Long turn: only ends when the barge-in sets cancel. + while cancel_event is None or not cancel_event.is_set(): + await asyncio.sleep(0.005) + cancel_seen.set() + return + yield "ok " # any later turn replies quickly + + return _gen() + + handler._llm_loop = _ParkUntilCancelLoop() + + class _STT: + async def receive_transcripts(self) -> AsyncIterator[Transcript]: + yield Transcript(text="dimmi una storia", is_final=True, confidence=0.9) + # Let turn 1 begin_speaking + park on its (long) LLM stream. + await asyncio.sleep(0.08) + # Caller barges in WHILE the agent turn is in flight. + yield Transcript(text="ferma per favore", is_final=True, confidence=0.9) + await asyncio.sleep(0.08) + + handler._stt = _STT() # type: ignore[assignment] + + await asyncio.wait_for(handler._stt_loop(), timeout=3.0) + + # The barge-in fired DURING turn 1 (the loop kept reading transcripts): + handler.audio_sender.send_clear.assert_awaited() + metrics.record_bargein_detected.assert_called() + # Turn 1's LLM stream observed the cancel (it was torn down, not left + # running until the next turn). + assert cancel_seen.is_set() + assert handler._is_speaking is False + + +@pytest.mark.unit +@pytest.mark.asyncio +class TestVadLegacyBranchSetsCancelEvent: + """A VAD-only barge-in during TTS tears down the LLM stream (layer 2a).""" + + async def test_vad_speech_start_sets_llm_cancel_event(self) -> None: + metrics = MagicMock() + handler = _make_handler(metrics=metrics) + handler._auto_vad = _ScriptedVAD([VADEvent(type="speech_start")]) + handler._is_speaking = True + handler._tail_grace_active = False + handler._speaking_generation = 1 + handler._speaking_started_at = time.time() - 2.0 + handler._first_audio_sent_at = time.time() - 2.0 + handler._inbound_audio_ring = [] + assert handler._llm_cancel_event.is_set() is False + + await handler.on_audio_received(_FRAME) + + # Real barge-in cancel ran AND the LLM stream cancel was signalled + # (previously only `_is_speaking` flipped, which Hermes never observed + # pre-first-token). + metrics.record_bargein_detected.assert_called_once() + assert handler._is_speaking is False + assert handler._llm_cancel_event.is_set() is True + + +@pytest.mark.unit +@pytest.mark.asyncio +class TestForwardSttWhileSpeakingFlag: + """``PATTER_FORWARD_STT_WHILE_SPEAKING`` gates audio-to-STT during TTS.""" + + async def test_flag_off_buffers_and_returns(self, monkeypatch) -> None: + monkeypatch.delenv("PATTER_FORWARD_STT_WHILE_SPEAKING", raising=False) + handler = _make_handler() + handler._auto_vad = _ScriptedVAD([None, None]) + handler._stt = AsyncMock() + handler._is_speaking = True + handler._tail_grace_active = False + handler._inbound_audio_ring = [] + + await handler.on_audio_received(_FRAME) + await handler.on_audio_received(_FRAME) + + # Default: audio withheld from STT during TTS, only ring-buffered. + assert handler._stt.send_audio.await_count == 0 + assert len(handler._inbound_audio_ring) == 2 + + async def test_flag_on_forwards_to_stt_during_tts(self, monkeypatch) -> None: + monkeypatch.setenv("PATTER_FORWARD_STT_WHILE_SPEAKING", "1") + handler = _make_handler() + assert handler._forward_stt_while_speaking is True + handler._auto_vad = _ScriptedVAD([None, None]) + handler._stt = AsyncMock() + handler._is_speaking = True + handler._tail_grace_active = False + handler._inbound_audio_ring = [] + + await handler.on_audio_received(_FRAME) + await handler.on_audio_received(_FRAME) + + # Flag on: audio ALSO reaches STT during TTS (so the transcript barge-in + # path can fire on echo-masked links) AND the ring still captures the + # leading edge for flush-on-barge-in. + assert handler._stt.send_audio.await_count == 2 + assert len(handler._inbound_audio_ring) == 2 diff --git a/libraries/python/tests/unit/test_provider_prefirsttoken_abort.py b/libraries/python/tests/unit/test_provider_prefirsttoken_abort.py new file mode 100644 index 0000000..13391ee --- /dev/null +++ b/libraries/python/tests/unit/test_provider_prefirsttoken_abort.py @@ -0,0 +1,152 @@ +"""Pre-first-token cancel abort for agent-runtime LLM providers. + +Hermes / OpenClaw run tools/memory/skills for tens of seconds BEFORE the first +SSE byte. The per-chunk ``cancel_event.is_set()`` check inside ``async for chunk +in response`` never runs during that window (the consumer is parked awaiting the +first byte), so a barge-in could not free the connection and the next user turn +blocked behind it. The provider now races ``create()`` + first-byte against the +cancel event and spawns a watchdog that ``close()``s the response the instant +the event fires, returning promptly without yielding. + +Only the external boundary is mocked: a fake AsyncOpenAI client whose streaming +response parks on an event until ``close()`` is called. +""" + +from __future__ import annotations + +import asyncio +from unittest.mock import AsyncMock, MagicMock + +import pytest + +from getpatter.llm.openai_compatible import OpenAICompatibleLLMProvider + + +class _ParkingResponse: + """Async-iterable streaming response that parks on the first ``__anext__`` + until ``close()`` is called — modelling Hermes holding the connection open + while it runs tools before the first token.""" + + def __init__(self) -> None: + self._closed = asyncio.Event() + self.close_calls = 0 + self.yielded_any = False + + def __aiter__(self): + return self + + async def __anext__(self): + await self._closed.wait() + # A read after the httpx stream is closed raises — model that so the + # provider's except-branch (swallow-when-cancelling) is exercised. + raise RuntimeError("stream closed mid-read") + + async def close(self) -> None: + self.close_calls += 1 + self._closed.set() + + +def _provider_with_fake_client(response: _ParkingResponse) -> OpenAICompatibleLLMProvider: + provider = OpenAICompatibleLLMProvider(base_url="http://127.0.0.1:9/v1", model="m") + fake_client = MagicMock() + fake_client.chat.completions.create = AsyncMock(return_value=response) + provider._client = fake_client # type: ignore[assignment] + return provider + + +@pytest.mark.mocked +@pytest.mark.asyncio +async def test_cancel_event_closes_response_before_first_token() -> None: + provider = _provider_with_fake_client(resp := _ParkingResponse()) + cancel = asyncio.Event() + chunks: list = [] + + async def _consume() -> None: + async for chunk in provider.stream( + [{"role": "user", "content": "hi"}], cancel_event=cancel + ): + chunks.append(chunk) + + task = asyncio.create_task(_consume()) + # Let it reach the parked first __anext__. + await asyncio.sleep(0.05) + assert chunks == [] # parked pre-first-token + + # Barge-in: the watchdog must close the response and stream() must return + # promptly WITHOUT yielding or raising. + cancel.set() + await asyncio.wait_for(task, timeout=1.0) + + assert resp.close_calls >= 1 # watchdog tore the request down + assert chunks == [] # nothing spoken + + +@pytest.mark.mocked +@pytest.mark.asyncio +async def test_cancel_during_create_aborts_in_flight_post() -> None: + """If the cancel fires while ``create()`` itself is still awaiting (the + server hasn't even sent headers), the in-flight POST is cancelled and + stream() returns nothing — no response object, no yield.""" + provider = OpenAICompatibleLLMProvider(base_url="http://127.0.0.1:9/v1", model="m") + cancel = asyncio.Event() + create_started = asyncio.Event() + + async def _never_returns(**_kwargs): + create_started.set() + await asyncio.Event().wait() # parks forever (server never responds) + + fake_client = MagicMock() + fake_client.chat.completions.create = _never_returns + provider._client = fake_client # type: ignore[assignment] + + chunks: list = [] + + async def _consume() -> None: + async for chunk in provider.stream( + [{"role": "user", "content": "hi"}], cancel_event=cancel + ): + chunks.append(chunk) + + task = asyncio.create_task(_consume()) + await asyncio.wait_for(create_started.wait(), timeout=1.0) + cancel.set() + await asyncio.wait_for(task, timeout=1.0) + assert chunks == [] + + +@pytest.mark.mocked +@pytest.mark.asyncio +async def test_no_cancel_event_streams_normally() -> None: + """Regression guard: with no cancel_event the watchdog is never spawned and + a normal streamed response yields its text unchanged.""" + + class _Chunk: + def __init__(self, content): + self.usage = None + self.choices = [ + MagicMock(delta=MagicMock(content=content, tool_calls=None)) + ] + + class _OneShot: + def __init__(self): + self._items = [_Chunk("Hello "), _Chunk("there.")] + + def __aiter__(self): + return self + + async def __anext__(self): + if not self._items: + raise StopAsyncIteration + return self._items.pop(0) + + provider = OpenAICompatibleLLMProvider(base_url="http://127.0.0.1:9/v1", model="m") + fake_client = MagicMock() + fake_client.chat.completions.create = AsyncMock(return_value=_OneShot()) + provider._client = fake_client # type: ignore[assignment] + + texts = [ + c["content"] + async for c in provider.stream([{"role": "user", "content": "hi"}]) + if c.get("type") == "text" + ] + assert texts == ["Hello ", "there."] diff --git a/libraries/typescript/src/stream-handler.ts b/libraries/typescript/src/stream-handler.ts index fefd056..23d1f51 100644 --- a/libraries/typescript/src/stream-handler.ts +++ b/libraries/typescript/src/stream-handler.ts @@ -1021,6 +1021,25 @@ export class StreamHandler { private maxDurationTimer: ReturnType | null = null; private transcriptProcessing = false; private transcriptQueue: STTTranscript[] = []; + /** + * The in-flight turn dispatch (LLM + TTS) runs as a SINGLE tracked promise + * so the transcript drain loop keeps running ``handleBargeIn`` against the + * LIVE turn during a long (30-90 s) agent-runtime response, instead of + * head-of-line-blocking on it. Exactly one is in flight: the launcher awaits + * the previous one to settle (fast — a barge-in already aborted it) before + * starting the next, preserving history/metrics ordering. Parity with + * Python ``_dispatch_task``. + */ + private dispatchTask: Promise | null = null; + /** + * Opt-in (default OFF): forward inbound audio to STT even while the agent is + * speaking, so the transcript barge-in path can receive a transcript on + * echo-masked PSTN links where the VAD never fires. ECHO RISK without AEC. + * Parity with Python ``_forward_stt_while_speaking``. + */ + private readonly forwardSttWhileSpeaking = ['1', 'true', 'yes'].includes( + (process.env.PATTER_FORWARD_STT_WHILE_SPEAKING ?? '').trim().toLowerCase(), + ); // Throttle state for back-to-back STT finals — see ``commitTranscript``. private lastCommitText = ''; private lastCommitAt = 0; @@ -1046,6 +1065,15 @@ export class StreamHandler { this.caller = caller; this.callee = callee; + if (this.forwardSttWhileSpeaking) { + getLogger().warn( + 'PATTER_FORWARD_STT_WHILE_SPEAKING=on: inbound audio is sent to STT ' + + 'during TTS so transcript barge-in works on echo-masked links. ' + + "Without AEC the agent's own voice may be transcribed as a phantom " + + 'interruption — pair with agent.bargeInStrategies.', + ); + } + this.bargeInStrategies = (deps.agent.bargeInStrategies ?? []).slice(); const confirmMs = deps.agent.bargeInConfirmMs; this.bargeInConfirmMs = @@ -1611,9 +1639,15 @@ export class StreamHandler { ) { this.inboundAudioRing.shift(); } + // Opt-in: also forward the frame to STT during TTS so the transcript + // barge-in path can receive a transcript on echo-masked links where + // the VAD never fires. The ring push above stays unconditional + // (leading-edge recovery preserved); only the early-return is gated. + // ECHO RISK without AEC. Default OFF → byte-identical push-and-return. + if (!this.forwardSttWhileSpeaking) return; + } else if ((this.deps.agent.bargeInThresholdMs ?? 300) === 0) { return; } - if ((this.deps.agent.bargeInThresholdMs ?? 300) === 0) return; } // beforeSendToStt hook — gate/transform the audio chunk before it @@ -1738,6 +1772,10 @@ export class StreamHandler { if (typeof ttsCancelable?.cancelActiveStream === 'function') { try { ttsCancelable.cancelActiveStream(); } catch { /* defensive */ } } + // Settle the backgrounded turn dispatch (the abort above unblocks it) so + // no in-flight LLM/TTS work touches adapters after they close. Parity with + // Python cleanup awaiting ``_dispatch_task``. + await this.dispatchTask?.catch(() => {}); // Drop any pending barge-in timer BEFORE we tear down metrics / // adapters. Without this, a call that ends while a barge-in is // pending leaves a setTimeout scheduled to fire ``bargeInConfirmMs`` @@ -1775,6 +1813,9 @@ export class StreamHandler { if (typeof ttsCancelable?.cancelActiveStream === 'function') { try { ttsCancelable.cancelActiveStream(); } catch { /* defensive */ } } + // Settle the backgrounded turn dispatch before tearing down adapters + // (parity with handleStop / Python cleanup). + await this.dispatchTask?.catch(() => {}); // See handleStop — drop pending barge-in timer before cleanup so a // dead handler can never fire a stale recordOverlapEnd callback. this.clearPendingBargeIn(); @@ -2493,84 +2534,114 @@ export class StreamHandler { // Push filtered text to history (after hook, so LLM sees redacted/modified text) this.history.push({ role: 'user', text: filteredTranscript, timestamp: Date.now() }); - let responseText = ''; - // Wave6B: record that the transcript is being committed to the LLM. // onUserTurnCompleted hook is not yet wired in TS — record 0 delay so EOU can still emit. this.metricsAcc.recordOnUserTurnCompletedDelay(0); this.metricsAcc.recordTurnCommitted(); closeEndpointSpan(); - if (this.deps.onMessage && typeof this.deps.onMessage === 'function') { - try { - responseText = await this.deps.onMessage({ + // Settle the previous turn first (single-in-flight). It is either already + // done, or this transcript's handleBargeIn above just aborted it — so this + // await is fast and does not head-of-line-block the drain loop in + // practice, while preserving strict per-turn history/metrics ordering. + await this.dispatchTask?.catch(() => {}); + // Launch the turn as a tracked background task and RETURN immediately so + // the transcript drain loop keeps running handleBargeIn against this LIVE + // turn (the head-of-line-blocking fix). Parity with Python + // ``create_task(_dispatch_turn(...))``. + this.dispatchTask = this.dispatchTurn(filteredTranscript, hookExecutor, hookCtx, interrupted); + } + + /** + * Post-commit turn body (LLM dispatch → TTS → turn-complete) run as a + * tracked background task so the transcript drain loop is not blocked for + * the whole (possibly 30-90 s) agent-runtime turn. A barge-in — transcript + * (now reachable mid-turn) or VAD — aborts the in-flight ``llmAbort`` and + * flips ``isSpeaking``, which the LLM/TTS loops here observe and break on. + * Parity with Python ``_dispatch_turn``. + */ + private async dispatchTurn( + filteredTranscript: string, + hookExecutor: PipelineHookExecutor, + hookCtx: HookContext, + interrupted: boolean, + ): Promise { + const label = this.deps.bridge.label; + let responseText = ''; + try { + if (this.deps.onMessage && typeof this.deps.onMessage === 'function') { + try { + responseText = await this.deps.onMessage({ + text: filteredTranscript, + call_id: this.callId, + caller: this.caller, + callee: this.callee, + history: [...this.history.entries], + }); + } catch (e) { + getLogger().error(`onMessage error (${label}):`, e); + return; + } + if (!responseText) { + // Common misuse: onMessage was provided as an observer (returning void) + // but it actually replaces the built-in LLM loop. Warn loudly — the caller + // will hear no audio until the handler returns a non-empty string. + getLogger().warn( + `onMessage returned empty/void (${label}) — no TTS will play. ` + + `If you intended to observe transcripts, use onTranscript instead; ` + + `if you meant to answer via the built-in LLM, remove onMessage and pass openaiKey.`, + ); + } + } else if (this.deps.onMessage && isRemoteUrl(this.deps.onMessage)) { + const msgData = { text: filteredTranscript, call_id: this.callId, caller: this.caller, callee: this.callee, history: [...this.history.entries], - }); - } catch (e) { - getLogger().error(`onMessage error (${label}):`, e); - return; - } - if (!responseText) { - // Common misuse: onMessage was provided as an observer (returning void) - // but it actually replaces the built-in LLM loop. Warn loudly — the caller - // will hear no audio until the handler returns a non-empty string. + }; + if (isWebSocketUrl(this.deps.onMessage)) { + await this.handleWebSocketResponse(msgData); + return; + } + try { + responseText = await this.deps.remoteHandler.callWebhook(this.deps.onMessage, msgData); + } catch (e) { + getLogger().error(`Webhook remote error (${label}):`, e); + return; + } + } else if (this.llmLoop) { + responseText = await this.runPipelineLlm(filteredTranscript, hookExecutor, hookCtx); + } else { getLogger().warn( - `onMessage returned empty/void (${label}) — no TTS will play. ` + - `If you intended to observe transcripts, use onTranscript instead; ` + - `if you meant to answer via the built-in LLM, remove onMessage and pass openaiKey.`, + `Pipeline (${label}) has no llm/onMessage handler — transcript ` + + `"${sanitizeLogValue(filteredTranscript.slice(0, 60))}" dropped. ` + + 'Check that agent.llm or onMessage is configured.', ); - } - } else if (this.deps.onMessage && isRemoteUrl(this.deps.onMessage)) { - const msgData = { - text: filteredTranscript, - call_id: this.callId, - caller: this.caller, - callee: this.callee, - history: [...this.history.entries], - }; - if (isWebSocketUrl(this.deps.onMessage)) { - await this.handleWebSocketResponse(msgData); return; } - try { - responseText = await this.deps.remoteHandler.callWebhook(this.deps.onMessage, msgData); - } catch (e) { - getLogger().error(`Webhook remote error (${label}):`, e); - return; - } - } else if (this.llmLoop) { - responseText = await this.runPipelineLlm(filteredTranscript, hookExecutor, hookCtx); - } else { - getLogger().warn( - `Pipeline (${label}) has no llm/onMessage handler — transcript ` + - `"${sanitizeLogValue(filteredTranscript.slice(0, 60))}" dropped. ` + - 'Check that agent.llm or onMessage is configured.', - ); - return; - } - if (!responseText) return; + if (!responseText) return; - if (this.llmLoop) { - await this.emitAssistantTranscript(responseText); - this.metricsAcc.recordTtsComplete(responseText); - } else { - interrupted = await this.runRegularLlm(responseText, hookExecutor, hookCtx) || interrupted; - // ``runRegularLlm`` returns the possibly-replaced text via side effect on - // history; recompute responseText from the last history entry for the - // turn-complete record. - responseText = this.history.entries[this.history.entries.length - 1]?.text ?? responseText; - } - - // Skip turn-complete when barge-in already recorded the turn as - // interrupted — mirrors Python ``if not interrupted``. Prevents - // double-counting / turn-count inflation / polluting p95. - if (!interrupted) { - await this.emitTurnMetrics(this.metricsAcc.recordTurnComplete(responseText)); + if (this.llmLoop) { + await this.emitAssistantTranscript(responseText); + this.metricsAcc.recordTtsComplete(responseText); + } else { + interrupted = (await this.runRegularLlm(responseText, hookExecutor, hookCtx)) || interrupted; + // ``runRegularLlm`` returns the possibly-replaced text via side effect on + // history; recompute responseText from the last history entry for the + // turn-complete record. + responseText = this.history.entries[this.history.entries.length - 1]?.text ?? responseText; + } + + // Skip turn-complete when barge-in already recorded the turn as + // interrupted — mirrors Python ``if not interrupted``. Prevents + // double-counting / turn-count inflation / polluting p95. + if (!interrupted) { + await this.emitTurnMetrics(this.metricsAcc.recordTurnComplete(responseText)); + } + } finally { + this.dispatchTask = null; } } diff --git a/libraries/typescript/tests/long-turn-filler.mocked.test.ts b/libraries/typescript/tests/long-turn-filler.mocked.test.ts index 2729413..bca84d6 100644 --- a/libraries/typescript/tests/long-turn-filler.mocked.test.ts +++ b/libraries/typescript/tests/long-turn-filler.mocked.test.ts @@ -214,6 +214,12 @@ describe('[mocked] pipeline long-turn filler (longTurnMessage)', () => { await vi.waitFor(() => expect(ttsCalls).toContain(FILLER), { timeout: 5000, }); + // Dispatch now runs as a backgrounded task (so the STT loop can barge-in + // mid-turn) — await it so the real reply has been synthesized before we + // assert on the full TTS sequence. + await (handler as unknown as { dispatchTask: Promise | null }).dispatchTask?.catch( + () => {}, + ); // The filler was spoken FIRST, then the real reply — exactly one filler. expect(ttsCalls.indexOf(FILLER)).toBe(0); expect(ttsCalls).toContain('Here is your answer.'); diff --git a/libraries/typescript/tests/pipeline-bargein-backgrounded.mocked.test.ts b/libraries/typescript/tests/pipeline-bargein-backgrounded.mocked.test.ts new file mode 100644 index 0000000..d23c3e6 --- /dev/null +++ b/libraries/typescript/tests/pipeline-bargein-backgrounded.mocked.test.ts @@ -0,0 +1,187 @@ +/** + * [mocked] Backgrounded-dispatch barge-in (parity with Python + * test_pipeline_bargein_backgrounded.py). + * + * The transcript drain loop must keep running ``handleBargeIn`` DURING a long + * agent-runtime turn — so a caller who speaks over a slow (30-90 s, tool- + * running) Hermes/OpenClaw response actually interrupts it, instead of being + * answered only after the turn finishes. Previously ``processTranscript`` + * awaited ``runPipelineLlm`` inline and stopped draining transcripts for the + * whole turn (head-of-line blocking). + * + * AUTHENTIC: the real StreamHandler + LLMLoop + pipeline turn path. Mocked only + * at the external boundary: the LLM provider stream (parks until aborted, like + * Hermes pre-first-token) and the TTS byte stream. + */ + +import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest'; +import { StreamHandler } from '../src/stream-handler'; +import type { TelephonyBridge, StreamHandlerDeps } from '../src/stream-handler'; +import { MetricsStore } from '../src/dashboard/store'; +import { RemoteMessageHandler } from '../src/remote-message'; +import type { AgentOptions } from '../src/types'; +import type { LLMProvider, LLMChunk, LLMStreamOptions } from '../src/llm-loop'; +import type { WebSocket as WSWebSocket } from 'ws'; + +vi.mock('../src/providers/elevenlabs-tts', async (importOriginal) => { + const original = + await importOriginal(); + return { + ...original, + ElevenLabsTTS: vi.fn().mockImplementation(() => ({ + synthesizeStream: vi.fn(async function* () { + yield Buffer.from('tts-audio'); + }), + })), + }; +}); +vi.mock('../src/dashboard/persistence', () => ({ notifyDashboard: vi.fn() })); + +import { ElevenLabsTTS } from '../src/providers/elevenlabs-tts'; + +function makeMockWs(): WSWebSocket { + return { + send: vi.fn(), close: vi.fn(), on: vi.fn(), once: vi.fn(), readyState: 1, + removeListener: vi.fn(), addEventListener: vi.fn(), removeEventListener: vi.fn(), + } as unknown as WSWebSocket; +} + +function makeMockStt() { + let cb: ((t: { isFinal?: boolean; text?: string }) => Promise) | undefined; + return { + connect: vi.fn().mockResolvedValue(undefined), + close: vi.fn(), + sendAudio: vi.fn(), + onTranscript: vi.fn((fn: (t: { isFinal?: boolean; text?: string }) => Promise) => { + cb = fn; + }), + get requestId() { return 'stt-bg-req'; }, + emitTranscript(text: string): Promise | undefined { + return cb?.({ isFinal: true, text }); + }, + }; +} + +function makeTwilioBridge(mockStt: ReturnType): TelephonyBridge { + return { + label: 'Twilio', telephonyProvider: 'twilio', + sendAudio: vi.fn(), sendMark: vi.fn(), sendClear: vi.fn(), + transferCall: vi.fn().mockResolvedValue(undefined), + endCall: vi.fn().mockResolvedValue(undefined), + createStt: vi.fn().mockReturnValue(mockStt), + queryTelephonyCost: vi.fn().mockResolvedValue(undefined), + } as unknown as TelephonyBridge; +} + +/** Provider that parks until its turn is aborted — models a long Hermes turn + * (tools running before the first token) that only ends on barge-in. */ +function makeParkUntilAbortProvider(aborted: { value: boolean }): LLMProvider { + return { + model: 'agent-runtime-1', + async *stream( + _messages: Array>, + _tools?: Array> | null, + opts?: LLMStreamOptions, + ): AsyncGenerator { + const signal = opts?.signal; + await new Promise((resolve) => { + if (signal?.aborted) return resolve(); + signal?.addEventListener('abort', () => resolve(), { once: true }); + }); + aborted.value = true; + // Yield nothing after abort — the turn was interrupted pre-first-token. + }, + } as unknown as LLMProvider; +} + +function makeDeps(bridge: TelephonyBridge, agentOverrides: Partial): StreamHandlerDeps { + const mockTts = new (ElevenLabsTTS as unknown as new (k: string, v?: string) => { + synthesizeStream: (t: string) => AsyncIterable; + })('el-key', 'rachel'); + const agent: AgentOptions = { + systemPrompt: 'You are a test pipeline agent.', + provider: 'pipeline', + tts: mockTts as unknown as AgentOptions['tts'], + ...agentOverrides, + } as AgentOptions; + return { + config: {}, agent, bridge, + metricsStore: new MetricsStore(), + pricing: null, + remoteHandler: new RemoteMessageHandler(), + recording: false, + buildAIAdapter: vi.fn(), + sanitizeVariables: vi.fn((raw: Record) => raw), + resolveVariables: vi.fn((tpl: string) => tpl), + } as unknown as StreamHandlerDeps; +} + +describe('[mocked] pipeline backgrounded-dispatch barge-in', () => { + beforeEach(() => { + vi.spyOn(globalThis, 'fetch').mockResolvedValue({ + ok: true, status: 200, json: async () => ({}), text: async () => '', + } as unknown as Response); + }); + afterEach(() => { + vi.restoreAllMocks(); + delete process.env.PATTER_FORWARD_STT_WHILE_SPEAKING; + }); + + it('a barge-in transcript cancels the in-flight long turn (loop not blocked)', async () => { + const stt = makeMockStt(); + const bridge = makeTwilioBridge(stt); + const aborted = { value: false }; + const deps = makeDeps(bridge, { + llm: makeParkUntilAbortProvider(aborted) as unknown as AgentOptions['llm'], + }); + const handler = new StreamHandler(deps, makeMockWs(), '+15551111111', '+15552222222'); + // Past the warmup gate so the barge-in is allowed to fire. + (handler as unknown as { canBargeIn: () => boolean }).canBargeIn = () => true; + + await handler.handleCallStart('CA-bg-bargein'); + + // Turn 1 starts and parks on its (long) LLM stream. + await stt.emitTranscript('dimmi una storia lunga'); + await vi.waitFor( + () => expect((handler as unknown as { isSpeaking: boolean }).isSpeaking).toBe(true), + { timeout: 3000 }, + ); + + // Caller barges in WHILE turn 1 is in flight. With the old inline-await + // this transcript would not be read until turn 1 ended. + await stt.emitTranscript('ferma per favore'); + + // The in-flight turn was cancelled: the carrier buffer was cleared and the + // LLM stream's abort signal fired (turn torn down pre-first-token). + await vi.waitFor( + () => expect(bridge.sendClear as ReturnType).toHaveBeenCalled(), + { timeout: 3000 }, + ); + expect((handler as unknown as { isSpeaking: boolean }).isSpeaking).toBe(false); + await vi.waitFor(() => expect(aborted.value).toBe(true), { timeout: 3000 }); + + // Settle the backgrounded dispatch before teardown. + await (handler as unknown as { dispatchTask: Promise | null }).dispatchTask?.catch( + () => {}, + ); + }, 10000); + + it('PATTER_FORWARD_STT_WHILE_SPEAKING is read from env (default off)', () => { + const offBridge = makeTwilioBridge(makeMockStt()); + const offHandler = new StreamHandler( + makeDeps(offBridge, {}), makeMockWs(), '+1', '+2', + ); + expect( + (offHandler as unknown as { forwardSttWhileSpeaking: boolean }).forwardSttWhileSpeaking, + ).toBe(false); + + process.env.PATTER_FORWARD_STT_WHILE_SPEAKING = '1'; + const onBridge = makeTwilioBridge(makeMockStt()); + const onHandler = new StreamHandler( + makeDeps(onBridge, {}), makeMockWs(), '+1', '+2', + ); + expect( + (onHandler as unknown as { forwardSttWhileSpeaking: boolean }).forwardSttWhileSpeaking, + ).toBe(true); + }); +}); From 762ac6ba102bd13ca640a399e8d807d9ef431787 Mon Sep 17 00:00:00 2001 From: nicolotognoni Date: Sun, 7 Jun 2026 11:26:51 +0200 Subject: [PATCH 04/11] =?UTF-8?q?fix(pipeline):=20harden=20backgrounded=20?= =?UTF-8?q?barge-in=20=E2=80=94=20history=20snapshot,=20abort-on-cancel,?= =?UTF-8?q?=20bounded=20teardown?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three defects found by adversarial review of the previous commit's decoupled-dispatch barge-in, all fixed with full Python/TS parity: 1. (HIGH, TS) Per-turn history was passed to the LLM by LIVE reference. With the turn dispatch backgrounded, a following transcript's user push (on the drain loop while the turn is in flight) could land in the in-flight turn's prompt before buildMessages read it — conflating two turns. Now a history SNAPSHOT is captured at launch and threaded through dispatchTurn → runPipelineLlm → llmLoop.run (and the onMessage/webhook paths), mirroring Python's list(self.conversation_history). Regression test added. 2. (MEDIUM, Python) On cleanup/hangup hard-cancel while the provider was parked pre-first-token, asyncio.wait did not cancel the in-flight create() POST, orphaning the Hermes/OpenClaw connection ("Task exception was never retrieved"). _open_stream_with_cancel now catches CancelledError and aborts the create task. Test added. 3. (MEDIUM, TS) handleStop/handleWsClose awaited the backgrounded dispatch with no timeout — a hung user onMessage (no AbortSignal) could block call teardown indefinitely. Teardown now bounds the wait via settleDispatchForTeardown (DISPATCH_SETTLE_TIMEOUT_MS = 30s); Python hard-cancels the task. Python 2220 / TypeScript 1766 pass; tsc + build clean. --- .../python/getpatter/services/llm_loop.py | 13 ++++ .../unit/test_provider_prefirsttoken_abort.py | 40 ++++++++++ libraries/typescript/src/stream-handler.ts | 73 ++++++++++++++++--- ...peline-bargein-backgrounded.mocked.test.ts | 57 +++++++++++++++ 4 files changed, 172 insertions(+), 11 deletions(-) diff --git a/libraries/python/getpatter/services/llm_loop.py b/libraries/python/getpatter/services/llm_loop.py index c5bd6ec..5c038a2 100644 --- a/libraries/python/getpatter/services/llm_loop.py +++ b/libraries/python/getpatter/services/llm_loop.py @@ -686,6 +686,19 @@ async def _open_stream_with_cancel(self, kwargs: dict, cancel_event): await asyncio.wait( {create_task, cancel_task}, return_when=asyncio.FIRST_COMPLETED ) + except asyncio.CancelledError: + # The containing dispatch task was hard-cancelled (cleanup / + # hangup teardown) while parked here pre-first-token. + # ``asyncio.wait`` does NOT cancel the futures it waits on, so abort + # the in-flight POST to free the Hermes/OpenClaw connection instead + # of orphaning it (which would later raise "Task exception was never + # retrieved" when the abandoned request errors). + create_task.cancel() + try: + await create_task + except BaseException: # noqa: BLE001 - aborting in-flight request + pass + raise finally: cancel_task.cancel() if not create_task.done(): diff --git a/libraries/python/tests/unit/test_provider_prefirsttoken_abort.py b/libraries/python/tests/unit/test_provider_prefirsttoken_abort.py index 13391ee..80daee9 100644 --- a/libraries/python/tests/unit/test_provider_prefirsttoken_abort.py +++ b/libraries/python/tests/unit/test_provider_prefirsttoken_abort.py @@ -114,6 +114,46 @@ async def _consume() -> None: assert chunks == [] +@pytest.mark.mocked +@pytest.mark.asyncio +async def test_task_cancel_aborts_in_flight_create_no_orphan() -> None: + """When the containing dispatch task is hard-cancelled (cleanup / hangup) + while parked pre-first-token, the in-flight create() POST must be aborted — + not orphaned (which would later raise 'Task exception was never retrieved' + and leak the Hermes/OpenClaw connection).""" + provider = OpenAICompatibleLLMProvider(base_url="http://127.0.0.1:9/v1", model="m") + cancel = asyncio.Event() + create_started = asyncio.Event() + create_cancelled = {"value": False} + + async def _never_returns(**_kwargs): + create_started.set() + try: + await asyncio.Event().wait() # parks (server running tools) + except asyncio.CancelledError: + create_cancelled["value"] = True + raise + + fake_client = MagicMock() + fake_client.chat.completions.create = _never_returns + provider._client = fake_client # type: ignore[assignment] + + async def _consume() -> None: + async for _chunk in provider.stream( + [{"role": "user", "content": "hi"}], cancel_event=cancel + ): + pass + + task = asyncio.create_task(_consume()) + await asyncio.wait_for(create_started.wait(), timeout=1.0) + # Simulate cleanup() hard-cancelling _dispatch_task while parked pre-create. + task.cancel() + with pytest.raises(asyncio.CancelledError): + await task + await asyncio.sleep(0) # let the abort propagate + assert create_cancelled["value"] is True + + @pytest.mark.mocked @pytest.mark.asyncio async def test_no_cancel_event_streams_normally() -> None: diff --git a/libraries/typescript/src/stream-handler.ts b/libraries/typescript/src/stream-handler.ts index 23d1f51..cbacbf3 100644 --- a/libraries/typescript/src/stream-handler.ts +++ b/libraries/typescript/src/stream-handler.ts @@ -1031,6 +1031,15 @@ export class StreamHandler { * Python ``_dispatch_task``. */ private dispatchTask: Promise | null = null; + /** + * Cap (ms) on how long teardown waits for the backgrounded dispatch to + * settle. JS promises are not cancellable, so a user-supplied ``onMessage`` + * (which receives no AbortSignal) parked on a hung external call could block + * call cleanup indefinitely — `llmAbort.abort()` only unblocks the built-in + * LLM/TTS paths. We bound the WAIT (Python hard-cancels the task instead). + * 30 s matches the webhook ceiling. + */ + private static readonly DISPATCH_SETTLE_TIMEOUT_MS = 30_000; /** * Opt-in (default OFF): forward inbound audio to STT even while the agent is * speaking, so the transcript barge-in path can receive a transcript on @@ -1752,6 +1761,27 @@ export class StreamHandler { } } + /** + * Await the backgrounded turn dispatch during teardown, but never block + * longer than ``DISPATCH_SETTLE_TIMEOUT_MS``. The earlier ``llmAbort.abort()`` + * settles the built-in LLM/TTS paths immediately; the cap only bites a + * misbehaving user ``onMessage`` parked on a hung external call (JS promises + * can't be cancelled). No-op when nothing is in flight. + */ + private async settleDispatchForTeardown(): Promise { + if (!this.dispatchTask) return; + const settle = this.dispatchTask.catch(() => {}); + let timer: ReturnType | undefined; + const cap = new Promise((resolve) => { + timer = setTimeout(resolve, StreamHandler.DISPATCH_SETTLE_TIMEOUT_MS); + }); + try { + await Promise.race([settle, cap]); + } finally { + if (timer) clearTimeout(timer); + } + } + /** Handle call stop / stream end. */ /** Handle a carrier-emitted `stop` event signalling the call has ended. */ async handleStop(): Promise { @@ -1773,9 +1803,10 @@ export class StreamHandler { try { ttsCancelable.cancelActiveStream(); } catch { /* defensive */ } } // Settle the backgrounded turn dispatch (the abort above unblocks it) so - // no in-flight LLM/TTS work touches adapters after they close. Parity with - // Python cleanup awaiting ``_dispatch_task``. - await this.dispatchTask?.catch(() => {}); + // no in-flight LLM/TTS work touches adapters after they close — bounded so + // a hung user onMessage cannot block teardown. Parity with Python cleanup + // hard-cancelling ``_dispatch_task``. + await this.settleDispatchForTeardown(); // Drop any pending barge-in timer BEFORE we tear down metrics / // adapters. Without this, a call that ends while a barge-in is // pending leaves a setTimeout scheduled to fire ``bargeInConfirmMs`` @@ -1813,9 +1844,9 @@ export class StreamHandler { if (typeof ttsCancelable?.cancelActiveStream === 'function') { try { ttsCancelable.cancelActiveStream(); } catch { /* defensive */ } } - // Settle the backgrounded turn dispatch before tearing down adapters - // (parity with handleStop / Python cleanup). - await this.dispatchTask?.catch(() => {}); + // Settle the backgrounded turn dispatch before tearing down adapters, + // bounded so a hung user onMessage cannot block teardown (see handleStop). + await this.settleDispatchForTeardown(); // See handleStop — drop pending barge-in timer before cleanup so a // dead handler can never fire a stale recordOverlapEnd callback. this.clearPendingBargeIn(); @@ -2545,11 +2576,24 @@ export class StreamHandler { // await is fast and does not head-of-line-block the drain loop in // practice, while preserving strict per-turn history/metrics ordering. await this.dispatchTask?.catch(() => {}); + // Snapshot history at launch — AFTER this turn's own user push above, BEFORE + // any later transcript can mutate it. The dispatch runs in the background, + // so passing the LIVE ``this.history.entries`` would let a following + // transcript's user push (which happens on the drain loop while this turn is + // in flight) contaminate this turn's LLM prompt. Mirrors Python's + // ``list(self.conversation_history)`` snapshot. + const historySnapshot = [...this.history.entries]; // Launch the turn as a tracked background task and RETURN immediately so // the transcript drain loop keeps running handleBargeIn against this LIVE // turn (the head-of-line-blocking fix). Parity with Python // ``create_task(_dispatch_turn(...))``. - this.dispatchTask = this.dispatchTurn(filteredTranscript, hookExecutor, hookCtx, interrupted); + this.dispatchTask = this.dispatchTurn( + filteredTranscript, + hookExecutor, + hookCtx, + interrupted, + historySnapshot, + ); } /** @@ -2565,6 +2609,7 @@ export class StreamHandler { hookExecutor: PipelineHookExecutor, hookCtx: HookContext, interrupted: boolean, + historySnapshot: Array<{ role: string; text: string }>, ): Promise { const label = this.deps.bridge.label; let responseText = ''; @@ -2576,7 +2621,7 @@ export class StreamHandler { call_id: this.callId, caller: this.caller, callee: this.callee, - history: [...this.history.entries], + history: historySnapshot, }); } catch (e) { getLogger().error(`onMessage error (${label}):`, e); @@ -2598,7 +2643,7 @@ export class StreamHandler { call_id: this.callId, caller: this.caller, callee: this.callee, - history: [...this.history.entries], + history: historySnapshot, }; if (isWebSocketUrl(this.deps.onMessage)) { await this.handleWebSocketResponse(msgData); @@ -2611,7 +2656,12 @@ export class StreamHandler { return; } } else if (this.llmLoop) { - responseText = await this.runPipelineLlm(filteredTranscript, hookExecutor, hookCtx); + responseText = await this.runPipelineLlm( + filteredTranscript, + hookExecutor, + hookCtx, + historySnapshot, + ); } else { getLogger().warn( `Pipeline (${label}) has no llm/onMessage handler — transcript ` + @@ -2909,6 +2959,7 @@ export class StreamHandler { filteredTranscript: string, hookExecutor: PipelineHookExecutor, hookCtx: HookContext, + historySnapshot: Array<{ role: string; text: string }>, ): Promise { const label = this.deps.bridge.label; const callCtx = { call_id: this.callId, caller: this.caller, callee: this.callee }; @@ -2972,7 +3023,7 @@ export class StreamHandler { try { for await (const token of this.llmLoop!.run( filteredTranscript, - this.history.entries, + historySnapshot, callCtx, this.metricsAcc, hookExecutor, diff --git a/libraries/typescript/tests/pipeline-bargein-backgrounded.mocked.test.ts b/libraries/typescript/tests/pipeline-bargein-backgrounded.mocked.test.ts index d23c3e6..91e0476 100644 --- a/libraries/typescript/tests/pipeline-bargein-backgrounded.mocked.test.ts +++ b/libraries/typescript/tests/pipeline-bargein-backgrounded.mocked.test.ts @@ -94,6 +94,30 @@ function makeParkUntilAbortProvider(aborted: { value: boolean }): LLMProvider { } as unknown as LLMProvider; } +/** Captures the built messages (prompt) the first stream() call receives, then + * yields a quick reply — to assert the in-flight turn's prompt is built from a + * history SNAPSHOT and cannot be contaminated by a later transcript's push. */ +function makeMessageCapturingProvider(captured: { + messages?: Array<{ role: string; content: string }>; +}): LLMProvider { + return { + model: 'agent-runtime-1', + async *stream( + messages: Array>, + _tools?: Array> | null, + _opts?: LLMStreamOptions, + ): AsyncGenerator { + if (!captured.messages) { + captured.messages = JSON.parse(JSON.stringify(messages)) as Array<{ + role: string; + content: string; + }>; + } + yield { type: 'text', content: 'va bene. ' }; + }, + } as unknown as LLMProvider; +} + function makeDeps(bridge: TelephonyBridge, agentOverrides: Partial): StreamHandlerDeps { const mockTts = new (ElevenLabsTTS as unknown as new (k: string, v?: string) => { synthesizeStream: (t: string) => AsyncIterable; @@ -184,4 +208,37 @@ describe('[mocked] pipeline backgrounded-dispatch barge-in', () => { (onHandler as unknown as { forwardSttWhileSpeaking: boolean }).forwardSttWhileSpeaking, ).toBe(true); }); + + it("a later transcript's history push does NOT contaminate the in-flight turn's prompt", async () => { + const stt = makeMockStt(); + const bridge = makeTwilioBridge(stt); + const captured: { messages?: Array<{ role: string; content: string }> } = {}; + const deps = makeDeps(bridge, { + llm: makeMessageCapturingProvider(captured) as unknown as AgentOptions['llm'], + }); + const handler = new StreamHandler(deps, makeMockWs(), '+15551111111', '+15552222222'); + await handler.handleCallStart('CA-bg-snapshot'); + + // Turn A is committed and its dispatch launched (backgrounded). emitTranscript + // resolves after the snapshot was captured at launch. + await stt.emitTranscript('domanda del turno A'); + // Simulate a FOLLOWING transcript's user push landing on the drain loop while + // turn A is still in flight (the exact race the snapshot fix guards against). + (handler as unknown as { history: { push: (e: unknown) => void } }).history.push({ + role: 'user', + text: 'TURNO B PIU TARDI', + timestamp: Date.now(), + }); + + await vi.waitFor(() => expect(captured.messages).toBeDefined(), { timeout: 3000 }); + await (handler as unknown as { dispatchTask: Promise | null }).dispatchTask?.catch( + () => {}, + ); + + const contents = (captured.messages ?? []).map((m) => m.content); + // Turn A's prompt was built from the launch-time snapshot — the later push + // is absent. With the pre-fix LIVE array it would leak in. + expect(contents).toContain('domanda del turno A'); + expect(contents).not.toContain('TURNO B PIU TARDI'); + }, 10000); }); From 83052d1eb1c6c9569b725eb37c8231fc1fc2f76e Mon Sep 17 00:00:00 2001 From: nicolotognoni Date: Sun, 7 Jun 2026 12:32:36 +0200 Subject: [PATCH 05/11] =?UTF-8?q?fix(pipeline):=20echo-safe=20barge-in=20?= =?UTF-8?q?=E2=80=94=20drop=20agent=20self-echo,=20keep=20fast=20follow-up?= =?UTF-8?q?,=20mark=20interrupted=20turns?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Residual Hermes/OpenClaw barge-in failure (live test, PATTER_FORWARD_STT_WHILE_SPEAKING=1, no AEC, no barge_in_strategies): barge-in fired on a PHANTOM transcript ("che tu l'hai" — the agent's own Italian TTS echoing into Deepgram, not caught by the English-only hallucination filter), the real follow-up was dropped leaving an empty [interrupted] turn, and the post-barge-in context was poisoned. A workflow root-cause (code trace + web research: Coval/Pipecat/LiveKit/Azure) confirmed this is NOT an interruptibility problem — the abort already works (bargein_ms=1.0). It is a GATE + ECHO + CONTEXT-REWRITE problem. Fixes, full Python/TS parity: 1. Echo guard (language-agnostic). Track the agent's in-flight spoken text (_current_agent_spoken_text / currentAgentSpokenText). A new _looks_like_echo / looksLikeEcho (substring OR >=60% word overlap) drops any barge-in (_handle_barge_in) or commit (_commit_transcript) that is the agent's own TTS echoing back. Active ONLY while _forward_stt_while_speaking, so the default VAD path and real post-turn replies are unaffected. 2. Back-to-back dedup fix. The <500ms drop now applies only to a NEAR-DUPLICATE of the previous final (Deepgram speech_final+is_final for the same utterance), via _is_near_duplicate / isNearDuplicate. A genuinely different fast follow-up is no longer swallowed into an empty [interrupted] turn. 3. Interrupted-turn context rewrite. On a confirmed mid-turn barge-in the spoken prefix is appended to history with an "[interrupted by caller]" marker, so a stateful agent runtime (Hermes/OpenClaw, X-Hermes-Session-Id) sees next turn that it was cut off and what the caller actually heard. Plus: fixed the stale _can_barge_in docstring (0.25 -> 0.5 s no-AEC gate). Recommended caller config (unchanged SDK defaults): barge_in_strategies= (MinWordsStrategy(min_words=2),), echo_cancellation=True. Tests: test_pipeline_echo_dedup.py (19) + pipeline-echo-dedup.mocked.test.ts (11); updated the back-to-back dedup tests to the corrected behaviour. Python 2236 / TypeScript 1777 pass; tsc + build clean. --- CHANGELOG.md | 4 + libraries/python/getpatter/stream_handler.py | 123 +++++++++- .../test_pipeline_bargein_backgrounded.py | 6 +- .../python/tests/unit/test_pipeline_dedup.py | 38 ++- .../tests/unit/test_pipeline_echo_dedup.py | 218 ++++++++++++++++++ libraries/typescript/src/stream-handler.ts | 114 ++++++++- ...peline-bargein-backgrounded.mocked.test.ts | 18 +- .../tests/pipeline-echo-dedup.mocked.test.ts | 119 ++++++++++ 8 files changed, 618 insertions(+), 22 deletions(-) create mode 100644 libraries/python/tests/unit/test_pipeline_echo_dedup.py create mode 100644 libraries/typescript/tests/pipeline-echo-dedup.mocked.test.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index a98937b..7c09b0e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,6 +15,10 @@ - **Decoupled, single-in-flight dispatch.** The turn now runs as one tracked background task (`_dispatch_task` / `dispatchTask`) so the receive loop keeps draining transcripts and runs barge-in detection against the LIVE turn. Exactly one dispatch is in flight: the loop settles the previous one before launching the next, so `conversation_history` / metrics ordering is unchanged. With no barge-in (default, VAD present, normal LLM) behaviour is unchanged — the loop still awaits the final turn to settle before returning. - **Prompt pre-first-token abort (Python).** Agent runtimes run tools for tens of seconds before the first token, during which the per-chunk `cancel_event` check never runs. The provider now races `create()` + first-byte against the cancel signal and spawns a watchdog that `close()`s the response the instant a barge-in fires, so the request is torn down immediately instead of blocking the next turn (TS already aborts promptly via `fetch` + `AbortController`). The VAD legacy barge-in branch now also sets `_llm_cancel_event` (it previously only flipped `_is_speaking`), and the OpenAI-compatible client uses an explicit httpx read/connect timeout so a dead gateway fails fast. - **`PATTER_FORWARD_STT_WHILE_SPEAKING` (opt-in, default off).** Forwards inbound audio to STT during TTS even with a VAD configured, so the transcript barge-in path can receive a transcript on echo-masked PSTN links where the VAD never fires. The leading-edge ring buffer is still captured. **Echo caveat:** without AEC the agent's own voice may be transcribed as a phantom interruption — pair with `agent.barge_in_strategies`. `libraries/python/getpatter/stream_handler.py`, `.../services/llm_loop.py`, `.../llm/openai_compatible.py`, `libraries/typescript/src/stream-handler.ts`. +- **Echo-safe barge-in: the agent no longer interrupts itself, and a fast real follow-up is no longer lost.** Hardening for the echo-prone agent-runtime case (`PATTER_FORWARD_STT_WHILE_SPEAKING` on, no AEC), where the agent's own TTS bled into STT and was transcribed (e.g. a garbled fragment in another language not covered by the English hallucination filter), firing a phantom barge-in and leaving an empty `[interrupted]` turn: + - **Echo guard** — a language-agnostic check (`_looks_like_echo` / `looksLikeEcho`: substring or ≥60% word overlap against the agent's in-flight spoken text) now drops any candidate barge-in/commit that is the agent's own speech echoing back. Active only while forwarding audio during TTS, so the default VAD path and real post-turn replies are untouched. + - **Back-to-back dedup fix** — a final within 500 ms of the previous is now dropped only when it is a *near-duplicate* (Deepgram emitting `speech_final` then `is_final` for the same utterance). A genuinely different fast follow-up (e.g. the real interruption right after a suppressed phantom) is kept instead of being silently swallowed into an empty turn. + - **Interrupted-turn context rewrite** — on a confirmed mid-turn barge-in the spoken prefix is recorded in history with an `[interrupted by caller]` marker (instead of an ungrounded full reply), so a stateful agent runtime (Hermes/OpenClaw, keyed by `X-Hermes-Session-Id`) sees on the next turn that it was cut off and what the caller actually heard. `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`. ## 0.6.5 (2026-06-05) diff --git a/libraries/python/getpatter/stream_handler.py b/libraries/python/getpatter/stream_handler.py index 14a0798..5cf0729 100644 --- a/libraries/python/getpatter/stream_handler.py +++ b/libraries/python/getpatter/stream_handler.py @@ -131,6 +131,50 @@ # ("We'll see you next time. Bye bye.") without importing ``re``. _SENTENCE_ENDERS = ".!?…。!?" +# Fraction of a candidate transcript's words that must appear in the agent's +# in-flight spoken text for it to be treated as the agent's own TTS echoing +# back (rather than real caller speech). 0.6 keeps real replies that merely +# share a couple of words while catching garbled echo fragments. Language- +# agnostic — unlike the English-only ``_STT_HALLUCINATIONS`` set. +_ECHO_WORD_OVERLAP_THRESHOLD = 0.6 + + +def _normalize_for_echo(text: str) -> str: + """Lowercase, drop punctuation, collapse whitespace — for echo comparison.""" + out = [] + for ch in text.lower(): + out.append(ch if (ch.isalnum() or ch.isspace()) else " ") + return " ".join("".join(out).split()) + + +def _looks_like_echo(candidate: str, agent_text: str) -> bool: + """True when ``candidate`` looks like a fragment of ``agent_text`` — i.e. the + agent's own TTS bleeding into STT (forwarded during TTS without effective + AEC) rather than real caller speech. Substring match OR high word-overlap. + """ + a = _normalize_for_echo(agent_text) + c = _normalize_for_echo(candidate) + if not a or not c: + return False + if c in a: # candidate is verbatim a fragment of what the agent said + return True + words = c.split() + if not words: + return False + agent_words = set(a.split()) + overlap = sum(1 for w in words if w in agent_words) / len(words) + return overlap >= _ECHO_WORD_OVERLAP_THRESHOLD + + +def _is_near_duplicate(a: str, b: str) -> bool: + """True when two normalised finals are the same utterance double-emitted + (identical, or one a substring of the other) — used to drop Deepgram's + ``speech_final``+``is_final`` back-to-back pair WITHOUT swallowing a + genuinely different utterance that merely arrives quickly.""" + if not a or not b: + return False + return a == b or a in b or b in a + def _is_stt_hallucination(text: str) -> bool: """True when *text* is — or is composed entirely of — known STT @@ -2383,6 +2427,17 @@ def __init__( # a no-op; cancelling avoids leaving an idle ``asyncio.sleep`` task # per turn on long, fast-turn calls. self._grace_task: asyncio.Task | None = None + # The agent's spoken text for the CURRENT turn, accumulated as tokens + # stream. Used by the echo guard to reject the agent's own TTS bleeding + # back into STT (when audio is forwarded during TTS without effective + # AEC) so it never barges in or becomes a phantom user turn. Reset at + # ``_begin_speaking``; only consulted while ``_forward_stt_while_speaking``. + self._current_agent_spoken_text = "" + # Whether the last completed turn was cut short by a confirmed barge-in + # — set by ``_process_streaming_response`` so the spoken prefix is + # appended to history with an ``[interrupted by caller]`` marker (keeps + # a stateful agent runtime's context grounded in what was actually heard). + self._last_response_interrupted = False # Per-turn LLM cancel event. Recreated on every new turn before LLM # consumption so a stale cancel from a previous turn cannot terminate # the next stream prematurely. Initialized here so the STT loop's @@ -3253,6 +3308,11 @@ async def _process_streaming_response(self, result, call_id: str) -> str: interrupted = True break full_response_parts.append(token) + # Keep the echo-guard reference current as the agent speaks, + # so a barge-in transcript arriving mid-turn can be compared + # against what the agent has said SO FAR (echo lags the + # tokens, so this is already ahead of the bleed). + self._current_agent_spoken_text = "".join(full_response_parts) # Fix 5: record LLM first-token (TTFT). if llm_first_token_sent[0] and self.metrics is not None: self.metrics.record_llm_first_token() @@ -3402,6 +3462,14 @@ async def _process_streaming_response(self, result, call_id: str) -> str: self.metrics.record_tts_complete(response_text) turn = self.metrics.record_turn_complete(response_text) await self._emit_turn_metrics(turn, call_id=call_id) + # Tell the caller (``_dispatch_turn``) whether this turn was cut short so + # the spoken prefix is recorded in history WITH a marker. A stateful + # agent runtime (Hermes/OpenClaw) then sees, on the next turn, that it + # was interrupted and what the caller actually heard — instead of an + # ungrounded full reply that pollutes its context. + self._last_response_interrupted = interrupted + if interrupted and response_text: + response_text = f"{response_text} [interrupted by caller]" return response_text async def _process_regular_response(self, response_text: str, call_id: str) -> None: @@ -3474,6 +3542,21 @@ async def _handle_barge_in(self, transcript) -> None: # before the VAD speech_start rescue fires. await self._end_tail_grace_for_new_turn() return + # Echo guard: when audio is forwarded to STT during TTS (no effective + # AEC), the agent's own voice can be transcribed and would otherwise + # barge in on itself. Drop any transcript that looks like a fragment of + # what the agent is currently saying. Only active under + # ``_forward_stt_while_speaking`` (the only path that feeds TTS audio to + # STT), so the default VAD path is unaffected. Mirrors TS ``handleBargeIn``. + if getattr(self, "_forward_stt_while_speaking", False) and _looks_like_echo( + transcript.text, getattr(self, "_current_agent_spoken_text", "") + ): + logger.info( + "Barge-in suppressed: transcript matches agent's own speech " + "(echo) — %r", + sanitize_log_value(transcript.text[:40]), + ) + return if not self._can_barge_in(): aec_state = "on" if getattr(self, "_aec", None) is not None else "off" logger.info( @@ -3638,8 +3721,10 @@ def _commit_transcript(self, text: str) -> bool: Mirrors TS ``commitTranscript``. Returns ``True`` if the transcript should be committed to a turn, ``False`` if it must be dropped. - Drop reasons: common hallucinations, duplicate within 2 s, or any - final within 500 ms of the previous one. + Drop reasons: common hallucinations, the agent's own TTS echo (when + forwarding audio to STT during TTS), exact duplicate within 2 s, or a + near-duplicate within 500 ms (the same utterance double-emitted) — a + genuinely different fast follow-up is NOT dropped. """ now = time.time() normalised = text.strip().lower() @@ -3649,6 +3734,19 @@ def _commit_transcript(self, text: str) -> bool: if stripped in _STT_HALLUCINATIONS or stripped == "": logger.debug("Dropped likely STT hallucination: %r", normalised[:40]) return False + # Echo guard: while the agent is still speaking (the forward-STT echo + # window), a transcript that matches the agent's own speech is its TTS + # bleeding back into STT, not a user turn. Gated on + # ``_forward_stt_while_speaking`` + ``_is_speaking`` so a real post-turn + # reply (committed when the agent is idle) is never dropped, and the + # default VAD path — which withholds audio during TTS — is unaffected. + if ( + getattr(self, "_forward_stt_while_speaking", False) + and getattr(self, "_is_speaking", False) + and _looks_like_echo(text, getattr(self, "_current_agent_spoken_text", "")) + ): + logger.debug("Dropped agent-echo transcript (not a user turn): %r", normalised[:40]) + return False if since_last < 2.0 and normalised == self._last_commit_text: logger.debug( "Dropped duplicate final transcript (%.1fs since last): %r", @@ -3656,9 +3754,14 @@ def _commit_transcript(self, text: str) -> bool: normalised[:40], ) return False - if since_last < 0.5: + # Back-to-back: drop a NEAR-DUPLICATE within 0.5 s (Deepgram emitting + # ``speech_final`` then ``is_final`` for the SAME utterance). A + # genuinely DIFFERENT utterance arriving this fast (e.g. the real reply + # right after a suppressed phantom) must NOT be swallowed — dropping it + # unconditionally left an empty ``[interrupted]`` turn before this fix. + if since_last < 0.5 and _is_near_duplicate(normalised, self._last_commit_text): logger.debug( - "Dropped back-to-back final transcript (%.2fs since last): %r", + "Dropped back-to-back near-duplicate final (%.2fs since last): %r", since_last, normalised[:40], ) @@ -4271,6 +4374,9 @@ async def _begin_speaking(self, is_first_message: bool = False) -> None: # turn so we never replay yesterday's audio to STT. self._inbound_audio_ring = [] self._suppressed_speech_pending = False + # Fresh turn — reset the echo-guard reference so this turn's barge-in + # checks compare against THIS turn's spoken text, not the last turn's. + self._current_agent_spoken_text = "" # Reset the VAD detector so the next user utterance triggers a clean # SILENCE→SPEECH transition. Without this, PSTN echo from the # previous turn can keep the smoothed probability above the @@ -4295,10 +4401,11 @@ def _mark_first_audio_sent(self) -> None: def _can_barge_in(self) -> bool: """Whether barge-in is allowed to fire right now. - Gate length depends on whether AEC is active: 1 s with AEC - (covers filter warmup), 0.25 s without (anti-flicker only — - keeps PSTN barge-in responsive, since on PSTN AEC is a no-op - and there is no warmup to protect). + Gate length depends on whether AEC is active: + ``MIN_AGENT_SPEAKING_S_BEFORE_BARGE_IN_AEC`` with AEC (covers filter + warmup), ``MIN_AGENT_SPEAKING_S_BEFORE_BARGE_IN_NO_AEC`` (0.5 s) without + — an anti-flicker margin that keeps PSTN barge-in responsive while + rejecting the first burst of echo/noise before real speech. ``getattr`` is used so test fixtures that flip ``_is_speaking`` directly (without going through ``_begin_speaking``) still diff --git a/libraries/python/tests/unit/test_pipeline_bargein_backgrounded.py b/libraries/python/tests/unit/test_pipeline_bargein_backgrounded.py index 295e200..f362e0c 100644 --- a/libraries/python/tests/unit/test_pipeline_bargein_backgrounded.py +++ b/libraries/python/tests/unit/test_pipeline_bargein_backgrounded.py @@ -135,7 +135,11 @@ async def receive_transcripts(self) -> AsyncIterator[Transcript]: # Turn 1's LLM stream observed the cancel (it was torn down, not left # running until the next turn). assert cancel_seen.is_set() - assert handler._is_speaking is False + # The REAL follow-up "ferma per favore" (different text, arriving <0.5s + # after turn 1) is NOT swallowed by the back-to-back dedup — it + # dispatches as a fresh turn (calls==2). Before the dedup fix it was + # dropped, leaving an empty [interrupted] turn and no reply. + assert handler._llm_loop.calls >= 2 @pytest.mark.unit diff --git a/libraries/python/tests/unit/test_pipeline_dedup.py b/libraries/python/tests/unit/test_pipeline_dedup.py index ba1bc25..a85f2fc 100644 --- a/libraries/python/tests/unit/test_pipeline_dedup.py +++ b/libraries/python/tests/unit/test_pipeline_dedup.py @@ -272,12 +272,16 @@ async def test_duplicate_normalises_whitespace_and_case( @pytest.mark.unit @pytest.mark.asyncio class TestThrottleFilter: - """Rule 3 — drop ANY final that lands within 500 ms of the last turn.""" + """Rule 3 — drop a NEAR-DUPLICATE final within 500 ms (Deepgram emitting + ``speech_final`` then ``is_final`` for the same utterance). A genuinely + DIFFERENT fast follow-up must NOT be swallowed (the empty-[interrupted]-turn + fix).""" - async def test_drops_back_to_back_under_500ms( + async def test_drops_back_to_back_near_duplicate_under_500ms( self, monkeypatch: pytest.MonkeyPatch ) -> None: - # Different text but only 0.2 s apart — treated as STT over-firing. + # Same utterance double-emitted 0.2 s apart (the second a superset of the + # first) — real STT over-firing, still de-duplicated. times = iter([100.0, 100.2]) monkeypatch.setattr( "getpatter.stream_handler.time.time", @@ -286,7 +290,7 @@ async def test_drops_back_to_back_under_500ms( stt = _StubSTT( [ Transcript(text="What time is it", is_final=True, confidence=0.9), - Transcript(text="Tell me the weather", is_final=True, confidence=0.9), + Transcript(text="What time is it now", is_final=True, confidence=0.9), ] ) on_transcript = AsyncMock() @@ -294,9 +298,33 @@ async def test_drops_back_to_back_under_500ms( await _run_loop(handler) - # Only the first should have been forwarded. + # Only the first should have been forwarded (near-duplicate dropped). assert on_transcript.await_count == 1 + async def test_keeps_different_followup_under_500ms( + self, monkeypatch: pytest.MonkeyPatch + ) -> None: + # Two genuinely DIFFERENT utterances 0.2 s apart are NOT over-firing — + # both must be kept (before the fix the second was silently dropped, + # leaving an empty [interrupted] turn). + times = iter([100.0, 100.2]) + monkeypatch.setattr( + "getpatter.stream_handler.time.time", + lambda: next(times), + ) + stt = _StubSTT( + [ + Transcript(text="What time is it", is_final=True, confidence=0.9), + Transcript(text="Tell me the weather", is_final=True, confidence=0.9), + ] + ) + on_transcript = AsyncMock() + handler = _make_handler(stt, on_transcript) + + await _run_loop(handler) + + assert on_transcript.await_count == 2 + async def test_passes_after_500ms(self, monkeypatch: pytest.MonkeyPatch) -> None: # Different text, 700 ms apart — legitimate second turn. times = iter([100.0, 100.7]) diff --git a/libraries/python/tests/unit/test_pipeline_echo_dedup.py b/libraries/python/tests/unit/test_pipeline_echo_dedup.py new file mode 100644 index 0000000..248d9d3 --- /dev/null +++ b/libraries/python/tests/unit/test_pipeline_echo_dedup.py @@ -0,0 +1,218 @@ +"""Echo-guard, back-to-back dedup, and interrupted-turn marking for the +pipeline turn-taking path — the residual Hermes/OpenClaw barge-in fixes. + +Root causes (live Hermes test, PATTER_FORWARD_STT_WHILE_SPEAKING=1, no AEC): +* the agent's own TTS bled into Deepgram and was transcribed as a phantom + ("che tu l'hai"), firing a false barge-in (legacy "any transcript = cancel"); +* the real follow-up final arriving <0.5s later was dropped by the back-to-back + filter even though its text was completely different → empty [interrupted] turn; +* the interrupted assistant turn was stored ungrounded, poisoning the next turn. +""" + +from __future__ import annotations + +import asyncio +import time +from collections import deque +from unittest.mock import AsyncMock, MagicMock + +import pytest + +from getpatter.providers.base import Transcript +from getpatter.stream_handler import ( + PipelineStreamHandler, + _is_near_duplicate, + _looks_like_echo, + _normalize_for_echo, +) + +from tests.conftest import make_agent + + +def _make_handler() -> PipelineStreamHandler: + handler = PipelineStreamHandler( + agent=make_agent(), + audio_sender=AsyncMock(), + call_id="call-echo", + caller="+15551110000", + callee="+15552220000", + resolved_prompt="p", + metrics=MagicMock(), + for_twilio=True, + on_transcript=None, + conversation_history=deque(maxlen=20), + transcript_entries=deque(maxlen=20), + ) + handler.on_message = None + handler._stt = AsyncMock() + return handler + + +# --------------------------------------------------------------------------- +# Pure helpers +# --------------------------------------------------------------------------- + + +@pytest.mark.unit +class TestEchoHelpers: + def test_normalize_strips_punct_and_case(self) -> None: + assert _normalize_for_echo("Ciao, come VA?!") == "ciao come va" + + def test_substring_fragment_is_echo(self) -> None: + agent = "Certo, ti racconto una storia molto lunga sul mare" + assert _looks_like_echo("una storia molto", agent) is True + + def test_high_word_overlap_is_echo(self) -> None: + agent = "che tu lo voglia o no, te l'ho già detto" + # garbled echo fragment whose words are mostly in the agent text + assert _looks_like_echo("che tu l'hai", agent) is True + + def test_unrelated_user_speech_is_not_echo(self) -> None: + agent = "Sto bene grazie, sono pronto ad aiutarti col tuo problema" + assert _looks_like_echo("fermati dimmi solo interrotto", agent) is False + + def test_empty_inputs_not_echo(self) -> None: + assert _looks_like_echo("", "qualcosa") is False + assert _looks_like_echo("qualcosa", "") is False + + def test_near_duplicate_substring_and_exact(self) -> None: + assert _is_near_duplicate("ciao come va", "ciao come va") is True + assert _is_near_duplicate("ciao come", "ciao come va") is True # prefix + assert _is_near_duplicate("ciao come va bene", "ciao come va") is True + assert _is_near_duplicate("fermati subito", "dimmi una storia") is False + + +# --------------------------------------------------------------------------- +# _commit_transcript +# --------------------------------------------------------------------------- + + +@pytest.mark.unit +class TestCommitTranscriptEchoAndDedup: + def test_echo_dropped_while_speaking_with_forward_flag(self) -> None: + h = _make_handler() + h._forward_stt_while_speaking = True + h._is_speaking = True + h._current_agent_spoken_text = "ti racconto una storia lunga sul mare" + assert h._commit_transcript("una storia lunga") is False + + def test_echo_not_dropped_when_flag_off(self) -> None: + h = _make_handler() + h._forward_stt_while_speaking = False # default + h._is_speaking = True + h._current_agent_spoken_text = "ti racconto una storia lunga sul mare" + # Flag off → echo guard inert → normal commit (real user could legitimately + # echo words; we only filter under the forward-STT echo-prone config). + assert h._commit_transcript("una storia lunga") is True + + def test_echo_not_dropped_when_idle(self) -> None: + h = _make_handler() + h._forward_stt_while_speaking = True + h._is_speaking = False # post-turn user reply, not an echo window + h._current_agent_spoken_text = "ti racconto una storia lunga sul mare" + assert h._commit_transcript("una storia lunga") is True + + def test_different_followup_within_500ms_not_dropped(self) -> None: + h = _make_handler() + h._last_commit_text = "dimmi una storia" + h._last_commit_at = time.time() # just now + # A genuinely different utterance arriving <0.5s later must survive + # (the empty-[interrupted]-turn fix). + assert h._commit_transcript("fermati dimmi solo interrotto") is True + + def test_near_duplicate_within_500ms_dropped(self) -> None: + h = _make_handler() + h._last_commit_text = "fermati dimmi solo" + h._last_commit_at = time.time() + # Deepgram speech_final then is_final for the same utterance (a superset) + # is still de-duplicated. + assert h._commit_transcript("fermati dimmi solo interrotto") is False + + +# --------------------------------------------------------------------------- +# _handle_barge_in echo guard +# --------------------------------------------------------------------------- + + +@pytest.mark.unit +@pytest.mark.asyncio +class TestHandleBargeInEchoGuard: + async def test_echo_transcript_does_not_barge_in(self) -> None: + h = _make_handler() + h._forward_stt_while_speaking = True + h._is_speaking = True + h._tail_grace_active = False + h._can_barge_in = lambda: True # type: ignore[assignment] + h._current_agent_spoken_text = "ti racconto una storia lunga sul mare aperto" + + await h._handle_barge_in(Transcript(text="una storia lunga", is_final=True, confidence=0.9)) + + # No cancel: the agent's own echo must not interrupt it. + h.audio_sender.send_clear.assert_not_awaited() + assert h._is_speaking is True + + async def test_real_speech_still_barges_in(self) -> None: + h = _make_handler() + h._forward_stt_while_speaking = True + h._is_speaking = True + h._tail_grace_active = False + h._can_barge_in = lambda: True # type: ignore[assignment] + h._current_agent_spoken_text = "ti racconto una storia lunga sul mare aperto" + + await h._handle_barge_in( + Transcript(text="fermati dimmi solo interrotto", is_final=True, confidence=0.9) + ) + + h.audio_sender.send_clear.assert_awaited() + assert h._is_speaking is False + + +# --------------------------------------------------------------------------- +# Interrupted-turn marking +# --------------------------------------------------------------------------- + + +@pytest.mark.unit +@pytest.mark.asyncio +class TestInterruptedTurnMarking: + async def test_interrupted_response_gets_marker(self) -> None: + h = _make_handler() + + class _FakeTTS: + output_format = "pcm_16000" + + async def synthesize(self, text: str): + yield b"\x00\x00" * 80 + + h._tts = _FakeTTS() # type: ignore[assignment] + + async def _result(): + yield "Ti racconto. " + # Simulate a barge-in cancelling the stream mid-turn. + h._llm_cancel_event.set() + yield "Questo non si sente." + + text = await h._process_streaming_response(_result(), "call-echo") + + assert h._last_response_interrupted is True + assert text.endswith("[interrupted by caller]") + assert "Ti racconto." in text + + async def test_complete_response_no_marker(self) -> None: + h = _make_handler() + + class _FakeTTS: + output_format = "pcm_16000" + + async def synthesize(self, text: str): + yield b"\x00\x00" * 80 + + h._tts = _FakeTTS() # type: ignore[assignment] + + async def _result(): + yield "Tutto bene, grazie. " + + text = await h._process_streaming_response(_result(), "call-echo") + + assert h._last_response_interrupted is False + assert "[interrupted by caller]" not in text diff --git a/libraries/typescript/src/stream-handler.ts b/libraries/typescript/src/stream-handler.ts index cbacbf3..9338883 100644 --- a/libraries/typescript/src/stream-handler.ts +++ b/libraries/typescript/src/stream-handler.ts @@ -342,6 +342,45 @@ export function isSttHallucination(text: string): boolean { return pieces.length > 1 && pieces.every((p) => HALLUCINATIONS.has(p)); } +/** Fraction of a candidate's words that must appear in the agent's spoken text + * for it to count as the agent's own TTS echoing back. Mirrors Python + * ``_ECHO_WORD_OVERLAP_THRESHOLD``. */ +const ECHO_WORD_OVERLAP_THRESHOLD = 0.6; + +/** Lowercase, drop punctuation, collapse whitespace — for echo comparison. */ +export function normalizeForEcho(text: string): string { + return text + .toLowerCase() + .replace(/[^\p{L}\p{N}\s]/gu, ' ') + .replace(/\s+/u, ' ') + .trim() + .replace(/\s+/gu, ' '); +} + +/** True when ``candidate`` looks like a fragment of ``agentText`` — i.e. the + * agent's own TTS bleeding into STT (forwarded during TTS without effective + * AEC) rather than real caller speech. Substring OR high word-overlap. + * Mirrors Python ``_looks_like_echo``. */ +export function looksLikeEcho(candidate: string, agentText: string): boolean { + const a = normalizeForEcho(agentText); + const c = normalizeForEcho(candidate); + if (!a || !c) return false; + if (a.includes(c)) return true; + const words = c.split(' ').filter(Boolean); + if (words.length === 0) return false; + const agentWords = new Set(a.split(' ')); + const overlap = words.filter((w) => agentWords.has(w)).length / words.length; + return overlap >= ECHO_WORD_OVERLAP_THRESHOLD; +} + +/** True when two normalised finals are the same utterance double-emitted + * (identical, or one a substring of the other). Mirrors Python + * ``_is_near_duplicate``. */ +export function isNearDuplicate(a: string, b: string): boolean { + if (!a || !b) return false; + return a === b || a.includes(b) || b.includes(a); +} + // --------------------------------------------------------------------------- // StreamHandler context (immutable per-call configuration) // --------------------------------------------------------------------------- @@ -650,6 +689,9 @@ export class StreamHandler { // Fresh turn — drop any stale pre-barge-in buffer from a previous turn // so we never replay yesterday's audio to STT. this.inboundAudioRing = []; + // Fresh turn — reset the echo-guard reference so barge-in checks compare + // against THIS turn's spoken text, not the last turn's. + this.currentAgentSpokenText = ''; // Reset the VAD detector so the next user utterance triggers a clean // SILENCE→SPEECH transition. Without this, PSTN echo from the previous // turn can keep the detector's smoothed probability above the @@ -1052,6 +1094,12 @@ export class StreamHandler { // Throttle state for back-to-back STT finals — see ``commitTranscript``. private lastCommitText = ''; private lastCommitAt = 0; + /** The agent's spoken text for the CURRENT turn, accumulated as tokens stream. + * The echo guard rejects transcripts matching it (the agent's own TTS bleeding + * back into STT when audio is forwarded during TTS without effective AEC). + * Reset in ``beginSpeaking``; only consulted while ``forwardSttWhileSpeaking``. + * Parity with Python ``_current_agent_spoken_text``. */ + private currentAgentSpokenText = ''; // PCM16 byte-alignment carry for TTS streaming (pipeline mode). // HTTP streams from ElevenLabs / OpenAI / Cartesia can yield chunks of any // size, including odd byte counts. Silently dropping the trailing odd byte @@ -2715,6 +2763,21 @@ export class StreamHandler { this.endTailGraceForNewTurn(); return false; } + // Echo guard: when audio is forwarded to STT during TTS (no effective AEC), + // the agent's own voice can be transcribed and would barge in on itself. + // Drop transcripts that look like a fragment of what the agent is saying. + // Only under forwardSttWhileSpeaking, so the default VAD path is unaffected. + if ( + this.forwardSttWhileSpeaking && + looksLikeEcho(transcript.text, this.currentAgentSpokenText) + ) { + getLogger().info( + `Barge-in suppressed: transcript matches agent's own speech (echo) — ${sanitizeLogValue( + transcript.text.slice(0, 40), + )}`, + ); + return false; + } if (!this.canBargeIn()) { getLogger().info( `Barge-in transcript suppressed (agent speaking < gate, aec=${this.aec ? 'on' : 'off'})`, @@ -2763,6 +2826,19 @@ export class StreamHandler { this.endTailGraceForNewTurn(); return false; } + // Echo guard (parity with handleBargeInAsync) — never let the agent's own + // forwarded TTS echo barge in on itself. + if ( + this.forwardSttWhileSpeaking && + looksLikeEcho(transcript.text, this.currentAgentSpokenText) + ) { + getLogger().info( + `Barge-in suppressed: transcript matches agent's own speech (echo) — ${sanitizeLogValue( + transcript.text.slice(0, 40), + )}`, + ); + return false; + } if (this.bargeInStrategies.length === 0) { // Legacy synchronous path — preserve exact byte-for-byte behaviour // for users who haven't opted into the confirm pipeline. @@ -2876,15 +2952,34 @@ export class StreamHandler { getLogger().debug(`Dropped likely STT hallucination: ${sanitizeLogValue(normalised.slice(0, 40))}`); return false; } + // Echo guard: while the agent is still speaking (the forward-STT echo + // window), a transcript that matches the agent's own speech is its TTS + // bleeding back into STT, not a user turn. Gated on forwardSttWhileSpeaking + // + isSpeaking so a real post-turn reply (committed when idle) is never + // dropped, and the default VAD path is unaffected. Parity with Python. + if ( + this.forwardSttWhileSpeaking && + this.isSpeaking && + looksLikeEcho(text, this.currentAgentSpokenText) + ) { + getLogger().debug( + `Dropped agent-echo transcript (not a user turn): ${sanitizeLogValue(normalised.slice(0, 40))}`, + ); + return false; + } if (sinceLastMs < 2000 && normalised === this.lastCommitText) { getLogger().debug( `Dropped duplicate final transcript (${(sinceLastMs / 1000).toFixed(1)}s since last): ${sanitizeLogValue(normalised.slice(0, 40))}`, ); return false; } - if (sinceLastMs < 500) { + // Back-to-back: drop a NEAR-DUPLICATE within 500 ms (Deepgram emitting + // speech_final then is_final for the SAME utterance). A genuinely DIFFERENT + // fast follow-up must NOT be swallowed — dropping it unconditionally left + // an empty [interrupted] turn before this fix. Parity with Python. + if (sinceLastMs < 500 && isNearDuplicate(normalised, this.lastCommitText)) { getLogger().debug( - `Dropped back-to-back final transcript (${(sinceLastMs / 1000).toFixed(2)}s since last): ${sanitizeLogValue(normalised.slice(0, 40))}`, + `Dropped back-to-back near-duplicate final (${(sinceLastMs / 1000).toFixed(2)}s since last): ${sanitizeLogValue(normalised.slice(0, 40))}`, ); return false; } @@ -3037,6 +3132,10 @@ export class StreamHandler { // Idempotent in the dispatcher. await this.emitLlmFirstToken(); allParts.push(token); + // Keep the echo-guard reference current as the agent speaks, so a + // barge-in transcript mid-turn is compared against what the agent has + // said so far (echo lags the tokens). Parity with Python. + this.currentAgentSpokenText = allParts.join(''); for (const sentence of chunker.push(token)) { if (!this.isSpeaking) break; await guardAndSpeak(sentence, !firstSentenceEmitted); @@ -3106,7 +3205,16 @@ export class StreamHandler { // Swallow — span teardown should never crash the call path. } } - return allParts.join(''); + const responseText = allParts.join(''); + // Tag the spoken prefix with an ``[interrupted by caller]`` marker when the + // turn was cut short, so a stateful agent runtime (Hermes/OpenClaw) sees, + // next turn, that it was interrupted and what the caller actually heard — + // not an ungrounded full reply that pollutes its context. Parity with + // Python ``_process_streaming_response``. + if (llmSignal.aborted && responseText) { + return `${responseText} [interrupted by caller]`; + } + return responseText; } /** diff --git a/libraries/typescript/tests/pipeline-bargein-backgrounded.mocked.test.ts b/libraries/typescript/tests/pipeline-bargein-backgrounded.mocked.test.ts index 91e0476..40ed429 100644 --- a/libraries/typescript/tests/pipeline-bargein-backgrounded.mocked.test.ts +++ b/libraries/typescript/tests/pipeline-bargein-backgrounded.mocked.test.ts @@ -73,9 +73,11 @@ function makeTwilioBridge(mockStt: ReturnType): TelephonyBri } as unknown as TelephonyBridge; } -/** Provider that parks until its turn is aborted — models a long Hermes turn - * (tools running before the first token) that only ends on barge-in. */ +/** Provider whose FIRST turn parks until aborted (models a long Hermes turn + * that only ends on barge-in); any later turn replies quickly so the real + * follow-up — no longer swallowed by the back-to-back dedup — can complete. */ function makeParkUntilAbortProvider(aborted: { value: boolean }): LLMProvider { + let calls = 0; return { model: 'agent-runtime-1', async *stream( @@ -83,6 +85,11 @@ function makeParkUntilAbortProvider(aborted: { value: boolean }): LLMProvider { _tools?: Array> | null, opts?: LLMStreamOptions, ): AsyncGenerator { + calls += 1; + if (calls > 1) { + yield { type: 'text', content: 'va bene. ' }; + return; + } const signal = opts?.signal; await new Promise((resolve) => { if (signal?.aborted) return resolve(); @@ -176,15 +183,16 @@ describe('[mocked] pipeline backgrounded-dispatch barge-in', () => { await stt.emitTranscript('ferma per favore'); // The in-flight turn was cancelled: the carrier buffer was cleared and the - // LLM stream's abort signal fired (turn torn down pre-first-token). + // LLM stream's abort signal fired (turn 1 torn down pre-first-token). await vi.waitFor( () => expect(bridge.sendClear as ReturnType).toHaveBeenCalled(), { timeout: 3000 }, ); - expect((handler as unknown as { isSpeaking: boolean }).isSpeaking).toBe(false); await vi.waitFor(() => expect(aborted.value).toBe(true), { timeout: 3000 }); - // Settle the backgrounded dispatch before teardown. + // Settle the backgrounded dispatch before teardown. The real follow-up + // "ferma per favore" (different text, <0.5s) is NOT swallowed by the + // back-to-back dedup — it dispatches as turn 2 and replies. await (handler as unknown as { dispatchTask: Promise | null }).dispatchTask?.catch( () => {}, ); diff --git a/libraries/typescript/tests/pipeline-echo-dedup.mocked.test.ts b/libraries/typescript/tests/pipeline-echo-dedup.mocked.test.ts new file mode 100644 index 0000000..b36fbc1 --- /dev/null +++ b/libraries/typescript/tests/pipeline-echo-dedup.mocked.test.ts @@ -0,0 +1,119 @@ +/** + * [mocked] Echo guard + back-to-back dedup for the pipeline turn-taking path — + * parity with Python test_pipeline_echo_dedup.py. Stops the agent's own TTS + * echo (forwarded to STT during TTS without AEC) from firing a phantom barge-in + * or becoming a user turn, and keeps a genuinely different fast follow-up from + * being swallowed by the back-to-back filter. + */ +import { describe, it, expect, vi } from 'vitest'; +import { + StreamHandler, + looksLikeEcho, + normalizeForEcho, + isNearDuplicate, +} from '../src/stream-handler'; +import type { TelephonyBridge, StreamHandlerDeps } from '../src/stream-handler'; +import { MetricsStore } from '../src/dashboard/store'; +import { RemoteMessageHandler } from '../src/remote-message'; +import type { AgentOptions } from '../src/types'; +import type { WebSocket as WSWebSocket } from 'ws'; + +function makeMockWs(): WSWebSocket { + return { + send: vi.fn(), close: vi.fn(), on: vi.fn(), once: vi.fn(), readyState: 1, + removeListener: vi.fn(), addEventListener: vi.fn(), removeEventListener: vi.fn(), + } as unknown as WSWebSocket; +} +function makeBridge(): TelephonyBridge { + return { + label: 'Twilio', telephonyProvider: 'twilio', + sendAudio: vi.fn(), sendMark: vi.fn(), sendClear: vi.fn(), + transferCall: vi.fn().mockResolvedValue(undefined), + endCall: vi.fn().mockResolvedValue(undefined), + createStt: vi.fn().mockReturnValue(null), + queryTelephonyCost: vi.fn().mockResolvedValue(undefined), + } as unknown as TelephonyBridge; +} +function makeDeps(): StreamHandlerDeps { + const agent: AgentOptions = { + systemPrompt: 'test', provider: 'pipeline', model: 'gpt-4o-mini', voice: 'alloy', + }; + return { + config: {}, agent, bridge: makeBridge(), metricsStore: new MetricsStore(), + pricing: null, remoteHandler: new RemoteMessageHandler(), recording: false, + buildAIAdapter: vi.fn().mockReturnValue(null), + sanitizeVariables: vi.fn((r: Record) => r), + resolveVariables: vi.fn((t: string) => t), + } as unknown as StreamHandlerDeps; +} +interface CommitHandle { + forwardSttWhileSpeaking: boolean; + isSpeaking: boolean; + currentAgentSpokenText: string; + lastCommitText: string; + lastCommitAt: number; + commitTranscript(text: string): boolean; +} +function makeHandler(): CommitHandle { + return new StreamHandler(makeDeps(), makeMockWs(), '+1', '+2') as unknown as CommitHandle; +} + +describe('[mocked] echo + dedup helpers', () => { + it('normalizeForEcho strips punctuation and case', () => { + expect(normalizeForEcho('Ciao, come VA?!')).toBe('ciao come va'); + }); + it('looksLikeEcho: substring fragment is echo', () => { + expect(looksLikeEcho('una storia molto', 'Certo, ti racconto una storia molto lunga')).toBe(true); + }); + it('looksLikeEcho: high word overlap is echo', () => { + expect(looksLikeEcho("che tu l'hai", "che tu lo voglia o no, te l'ho già detto")).toBe(true); + }); + it('looksLikeEcho: unrelated user speech is not echo', () => { + expect(looksLikeEcho('fermati dimmi solo interrotto', 'Sto bene grazie sono pronto ad aiutarti')).toBe(false); + }); + it('looksLikeEcho: empty inputs are not echo', () => { + expect(looksLikeEcho('', 'qualcosa')).toBe(false); + expect(looksLikeEcho('qualcosa', '')).toBe(false); + }); + it('isNearDuplicate: exact and substring', () => { + expect(isNearDuplicate('ciao come va', 'ciao come va')).toBe(true); + expect(isNearDuplicate('ciao come', 'ciao come va')).toBe(true); + expect(isNearDuplicate('fermati subito', 'dimmi una storia')).toBe(false); + }); +}); + +describe('[mocked] commitTranscript echo + dedup', () => { + it('drops echo while speaking with the forward flag', () => { + const h = makeHandler(); + h.forwardSttWhileSpeaking = true; + h.isSpeaking = true; + h.currentAgentSpokenText = 'ti racconto una storia lunga sul mare'; + expect(h.commitTranscript('una storia lunga')).toBe(false); + }); + it('does NOT drop echo when the flag is off (default)', () => { + const h = makeHandler(); + h.forwardSttWhileSpeaking = false; + h.isSpeaking = true; + h.currentAgentSpokenText = 'ti racconto una storia lunga sul mare'; + expect(h.commitTranscript('una storia lunga')).toBe(true); + }); + it('does NOT drop when idle (post-turn user reply)', () => { + const h = makeHandler(); + h.forwardSttWhileSpeaking = true; + h.isSpeaking = false; + h.currentAgentSpokenText = 'ti racconto una storia lunga sul mare'; + expect(h.commitTranscript('una storia lunga')).toBe(true); + }); + it('keeps a different follow-up within 500ms (empty-[interrupted]-turn fix)', () => { + const h = makeHandler(); + h.lastCommitText = 'dimmi una storia'; + h.lastCommitAt = Date.now(); + expect(h.commitTranscript('fermati dimmi solo interrotto')).toBe(true); + }); + it('drops a near-duplicate within 500ms (Deepgram double-final)', () => { + const h = makeHandler(); + h.lastCommitText = 'fermati dimmi solo'; + h.lastCommitAt = Date.now(); + expect(h.commitTranscript('fermati dimmi solo interrotto')).toBe(false); + }); +}); From 2d88502285f2ada5f7a72b242df4ba2a478a86aa Mon Sep 17 00:00:00 2001 From: nicolotognoni Date: Sun, 7 Jun 2026 12:57:38 +0200 Subject: [PATCH 06/11] =?UTF-8?q?fix(pipeline):=20harden=20echo-safe=20bar?= =?UTF-8?q?ge-in=20=E2=80=94=20no=20false-positives=20on=20short=20replies?= =?UTF-8?q?,=20word-boundary=20dedup,=20clean=20interrupted=20metrics?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adversarial review of the echo-safe barge-in commit found three real HIGH false-positive risks; all fixed with full Python/TS parity: 1. (HIGH) Echo guard could silently drop a legitimate SHORT caller answer that repeats the agent's offered words (e.g. agent "lunedì o martedì?", caller "lunedì" → substring match → dropped, caller goes unheard). Real TTS echo is a long near-complete fragment, not a 1-3 word reply. The echo guard now requires >= _ECHO_MIN_CANDIDATE_WORDS (4) words before classifying a candidate as echo, so short answers are never dropped. (Short echo blips on a no-AEC link are left to AEC / barge_in_strategies.) 2. (HIGH) Back-to-back dedup used a character-level substring test, so a genuinely different short follow-up was dropped ("no" matched inside "nothing else") — and this ran on the DEFAULT path (not gated on the echo flag), affecting all pipeline users. _is_near_duplicate / isNearDuplicate is now word-boundary aware (equal, or a true word-prefix double-emit), so "nothing else" is no longer a duplicate of "no" while Deepgram's speech_final+is_final pair still de-duplicates. 3. (HIGH, TS) The interrupted-turn "[interrupted by caller]" marker leaked into metrics: runPipelineLlm returned the marked text and dispatchTurn fed it to recordTtsComplete/recordTurnComplete. runPipelineLlm now returns { text, interrupted }; dispatchTurn records metrics on the PLAIN text (gated on !interrupted) and applies the marker to the history/transcript only — mirroring Python, where metrics are recorded before the marker is appended. Tests updated to the corrected behaviour (>=4-word echo examples + explicit short-answer-exemption + word-boundary dedup cases). Python 2237 / TypeScript 1779 pass; tsc + build clean. --- libraries/python/getpatter/stream_handler.py | 27 ++++++--- .../tests/unit/test_pipeline_echo_dedup.py | 29 +++++++--- libraries/typescript/src/stream-handler.ts | 57 +++++++++++++------ .../tests/pipeline-echo-dedup.mocked.test.ts | 31 +++++++--- 4 files changed, 103 insertions(+), 41 deletions(-) diff --git a/libraries/python/getpatter/stream_handler.py b/libraries/python/getpatter/stream_handler.py index 5cf0729..3d322ed 100644 --- a/libraries/python/getpatter/stream_handler.py +++ b/libraries/python/getpatter/stream_handler.py @@ -137,6 +137,12 @@ # share a couple of words while catching garbled echo fragments. Language- # agnostic — unlike the English-only ``_STT_HALLUCINATIONS`` set. _ECHO_WORD_OVERLAP_THRESHOLD = 0.6 +# Minimum word count before a candidate can be classified as echo. Real TTS +# bleed is a long, near-complete fragment of the agent's speech; a 1-3 word +# caller reply that happens to repeat the agent's offered words ("lunedì", +# "yes", "Monday at two") is a legitimate answer and must NEVER be dropped. +# Short echo blips on a no-AEC link are left to AEC / barge_in_strategies. +_ECHO_MIN_CANDIDATE_WORDS = 4 def _normalize_for_echo(text: str) -> str: @@ -156,11 +162,13 @@ def _looks_like_echo(candidate: str, agent_text: str) -> bool: c = _normalize_for_echo(candidate) if not a or not c: return False - if c in a: # candidate is verbatim a fragment of what the agent said - return True words = c.split() - if not words: + # Never classify a short reply as echo — exempts single-word / few-word + # caller answers that legitimately repeat the agent's offered words. + if len(words) < _ECHO_MIN_CANDIDATE_WORDS: return False + if c in a: # candidate is verbatim a long fragment of what the agent said + return True agent_words = set(a.split()) overlap = sum(1 for w in words if w in agent_words) / len(words) return overlap >= _ECHO_WORD_OVERLAP_THRESHOLD @@ -168,12 +176,17 @@ def _looks_like_echo(candidate: str, agent_text: str) -> bool: def _is_near_duplicate(a: str, b: str) -> bool: """True when two normalised finals are the same utterance double-emitted - (identical, or one a substring of the other) — used to drop Deepgram's - ``speech_final``+``is_final`` back-to-back pair WITHOUT swallowing a - genuinely different utterance that merely arrives quickly.""" + (identical, or one a WORD-PREFIX of the other — Deepgram's + ``speech_final``+``is_final`` pair) — used to drop the back-to-back pair + WITHOUT swallowing a genuinely different utterance that merely arrives + quickly. Word-boundary aware so a character infix ("no" in "nothing + else") is NOT treated as a duplicate.""" if not a or not b: return False - return a == b or a in b or b in a + if a == b: + return True + shorter, longer = (a, b) if len(a) <= len(b) else (b, a) + return longer.startswith(shorter + " ") def _is_stt_hallucination(text: str) -> bool: diff --git a/libraries/python/tests/unit/test_pipeline_echo_dedup.py b/libraries/python/tests/unit/test_pipeline_echo_dedup.py index 248d9d3..d7992b5 100644 --- a/libraries/python/tests/unit/test_pipeline_echo_dedup.py +++ b/libraries/python/tests/unit/test_pipeline_echo_dedup.py @@ -59,13 +59,22 @@ def test_normalize_strips_punct_and_case(self) -> None: assert _normalize_for_echo("Ciao, come VA?!") == "ciao come va" def test_substring_fragment_is_echo(self) -> None: - agent = "Certo, ti racconto una storia molto lunga sul mare" - assert _looks_like_echo("una storia molto", agent) is True + agent = "Certo, ti racconto una storia molto lunga sul mare aperto" + # A long (>=4 word) verbatim fragment of the agent's speech is echo. + assert _looks_like_echo("ti racconto una storia molto", agent) is True def test_high_word_overlap_is_echo(self) -> None: - agent = "che tu lo voglia o no, te l'ho già detto" - # garbled echo fragment whose words are mostly in the agent text - assert _looks_like_echo("che tu l'hai", agent) is True + agent = "che tu lo voglia o no, te l'ho già detto chiaramente" + # garbled >=4-word echo fragment whose words are mostly in the agent text + assert _looks_like_echo("che tu lo voglia detto", agent) is True + + def test_short_answer_repeating_agent_is_not_echo(self) -> None: + # The key false-positive guard: a 1-3 word caller answer that picks one + # of the agent's offered words must NEVER be classified as echo. + agent = "preferisci lunedì o martedì per l'appuntamento" + assert _looks_like_echo("lunedì", agent) is False + assert _looks_like_echo("monday at two", agent) is False + assert _looks_like_echo("sì va bene", agent) is False def test_unrelated_user_speech_is_not_echo(self) -> None: agent = "Sto bene grazie, sono pronto ad aiutarti col tuo problema" @@ -94,7 +103,7 @@ def test_echo_dropped_while_speaking_with_forward_flag(self) -> None: h._forward_stt_while_speaking = True h._is_speaking = True h._current_agent_spoken_text = "ti racconto una storia lunga sul mare" - assert h._commit_transcript("una storia lunga") is False + assert h._commit_transcript("ti racconto una storia lunga") is False def test_echo_not_dropped_when_flag_off(self) -> None: h = _make_handler() @@ -103,14 +112,14 @@ def test_echo_not_dropped_when_flag_off(self) -> None: h._current_agent_spoken_text = "ti racconto una storia lunga sul mare" # Flag off → echo guard inert → normal commit (real user could legitimately # echo words; we only filter under the forward-STT echo-prone config). - assert h._commit_transcript("una storia lunga") is True + assert h._commit_transcript("ti racconto una storia lunga") is True def test_echo_not_dropped_when_idle(self) -> None: h = _make_handler() h._forward_stt_while_speaking = True h._is_speaking = False # post-turn user reply, not an echo window h._current_agent_spoken_text = "ti racconto una storia lunga sul mare" - assert h._commit_transcript("una storia lunga") is True + assert h._commit_transcript("ti racconto una storia lunga") is True def test_different_followup_within_500ms_not_dropped(self) -> None: h = _make_handler() @@ -145,7 +154,9 @@ async def test_echo_transcript_does_not_barge_in(self) -> None: h._can_barge_in = lambda: True # type: ignore[assignment] h._current_agent_spoken_text = "ti racconto una storia lunga sul mare aperto" - await h._handle_barge_in(Transcript(text="una storia lunga", is_final=True, confidence=0.9)) + await h._handle_barge_in( + Transcript(text="ti racconto una storia lunga", is_final=True, confidence=0.9) + ) # No cancel: the agent's own echo must not interrupt it. h.audio_sender.send_clear.assert_not_awaited() diff --git a/libraries/typescript/src/stream-handler.ts b/libraries/typescript/src/stream-handler.ts index 9338883..7757313 100644 --- a/libraries/typescript/src/stream-handler.ts +++ b/libraries/typescript/src/stream-handler.ts @@ -347,6 +347,12 @@ export function isSttHallucination(text: string): boolean { * ``_ECHO_WORD_OVERLAP_THRESHOLD``. */ const ECHO_WORD_OVERLAP_THRESHOLD = 0.6; +/** Minimum word count before a candidate can be classified as echo — short + * caller replies that repeat the agent's offered words ("lunedì", "yes", + * "Monday at two") are legitimate answers, never echo. Mirrors Python + * ``_ECHO_MIN_CANDIDATE_WORDS``. */ +const ECHO_MIN_CANDIDATE_WORDS = 4; + /** Lowercase, drop punctuation, collapse whitespace — for echo comparison. */ export function normalizeForEcho(text: string): string { return text @@ -365,9 +371,11 @@ export function looksLikeEcho(candidate: string, agentText: string): boolean { const a = normalizeForEcho(agentText); const c = normalizeForEcho(candidate); if (!a || !c) return false; - if (a.includes(c)) return true; const words = c.split(' ').filter(Boolean); - if (words.length === 0) return false; + // Never classify a short reply as echo — exempts single-word / few-word + // caller answers that legitimately repeat the agent's offered words. + if (words.length < ECHO_MIN_CANDIDATE_WORDS) return false; + if (a.includes(c)) return true; const agentWords = new Set(a.split(' ')); const overlap = words.filter((w) => agentWords.has(w)).length / words.length; return overlap >= ECHO_WORD_OVERLAP_THRESHOLD; @@ -378,7 +386,11 @@ export function looksLikeEcho(candidate: string, agentText: string): boolean { * ``_is_near_duplicate``. */ export function isNearDuplicate(a: string, b: string): boolean { if (!a || !b) return false; - return a === b || a.includes(b) || b.includes(a); + if (a === b) return true; + const [shorter, longer] = a.length <= b.length ? [a, b] : [b, a]; + // Word-boundary aware: a character infix ("no" in "nothing else") is NOT a + // duplicate; only a true word-prefix double-emit (speech_final+is_final) is. + return longer.startsWith(shorter + ' '); } // --------------------------------------------------------------------------- @@ -2704,12 +2716,16 @@ export class StreamHandler { return; } } else if (this.llmLoop) { - responseText = await this.runPipelineLlm( + const llmResult = await this.runPipelineLlm( filteredTranscript, hookExecutor, hookCtx, historySnapshot, ); + responseText = llmResult.text; + // OR in whether the LLM stream itself was cut short, in addition to a + // barge-in already seen by handleBargeIn at the top of this turn. + interrupted = interrupted || llmResult.interrupted; } else { getLogger().warn( `Pipeline (${label}) has no llm/onMessage handler — transcript ` + @@ -2722,8 +2738,14 @@ export class StreamHandler { if (!responseText) return; if (this.llmLoop) { - await this.emitAssistantTranscript(responseText); - this.metricsAcc.recordTtsComplete(responseText); + // Marker goes to the history/transcript ONLY (so a stateful agent + // runtime sees it was interrupted); metrics use the PLAIN text and are + // gated on !interrupted — mirrors Python. + const spokenText = interrupted + ? `${responseText} [interrupted by caller]` + : responseText; + await this.emitAssistantTranscript(spokenText); + if (!interrupted) this.metricsAcc.recordTtsComplete(responseText); } else { interrupted = (await this.runRegularLlm(responseText, hookExecutor, hookCtx)) || interrupted; // ``runRegularLlm`` returns the possibly-replaced text via side effect on @@ -3048,14 +3070,16 @@ export class StreamHandler { /** * Streaming built-in LLM path with sentence chunking and per-sentence - * guardrails/TTS. Returns the concatenated response text. + * guardrails/TTS. Returns the concatenated (plain) response text plus whether + * the turn was cut short by a barge-in — the caller applies the interrupted + * marker to history only, keeping metrics on the plain text. */ private async runPipelineLlm( filteredTranscript: string, hookExecutor: PipelineHookExecutor, hookCtx: HookContext, historySnapshot: Array<{ role: string; text: string }>, - ): Promise { + ): Promise<{ text: string; interrupted: boolean }> { const label = this.deps.bridge.label; const callCtx = { call_id: this.callId, caller: this.caller, callee: this.callee }; const chunker = new SentenceChunker({ @@ -3205,16 +3229,13 @@ export class StreamHandler { // Swallow — span teardown should never crash the call path. } } - const responseText = allParts.join(''); - // Tag the spoken prefix with an ``[interrupted by caller]`` marker when the - // turn was cut short, so a stateful agent runtime (Hermes/OpenClaw) sees, - // next turn, that it was interrupted and what the caller actually heard — - // not an ungrounded full reply that pollutes its context. Parity with - // Python ``_process_streaming_response``. - if (llmSignal.aborted && responseText) { - return `${responseText} [interrupted by caller]`; - } - return responseText; + // Return the PLAIN text plus whether the turn was cut short. The caller + // (dispatchTurn) records metrics on the plain text and applies the + // ``[interrupted by caller]`` marker only to the history/transcript, so + // metrics (TTS cost, turn-complete) are never polluted by the marker. + // Parity with Python, where metrics are recorded on the unmarked text + // inside ``_process_streaming_response`` before the marker is appended. + return { text: allParts.join(''), interrupted: llmSignal.aborted }; } /** diff --git a/libraries/typescript/tests/pipeline-echo-dedup.mocked.test.ts b/libraries/typescript/tests/pipeline-echo-dedup.mocked.test.ts index b36fbc1..9f59674 100644 --- a/libraries/typescript/tests/pipeline-echo-dedup.mocked.test.ts +++ b/libraries/typescript/tests/pipeline-echo-dedup.mocked.test.ts @@ -62,11 +62,21 @@ describe('[mocked] echo + dedup helpers', () => { it('normalizeForEcho strips punctuation and case', () => { expect(normalizeForEcho('Ciao, come VA?!')).toBe('ciao come va'); }); - it('looksLikeEcho: substring fragment is echo', () => { - expect(looksLikeEcho('una storia molto', 'Certo, ti racconto una storia molto lunga')).toBe(true); + it('looksLikeEcho: long substring fragment is echo', () => { + expect( + looksLikeEcho('ti racconto una storia molto', 'Certo, ti racconto una storia molto lunga'), + ).toBe(true); }); - it('looksLikeEcho: high word overlap is echo', () => { - expect(looksLikeEcho("che tu l'hai", "che tu lo voglia o no, te l'ho già detto")).toBe(true); + it('looksLikeEcho: long high-word-overlap fragment is echo', () => { + expect( + looksLikeEcho('che tu lo voglia detto', "che tu lo voglia o no, te l'ho già detto"), + ).toBe(true); + }); + it('looksLikeEcho: short answer repeating the agent is NOT echo', () => { + const agent = "preferisci lunedì o martedì per l'appuntamento"; + expect(looksLikeEcho('lunedì', agent)).toBe(false); + expect(looksLikeEcho('monday at two', agent)).toBe(false); + expect(looksLikeEcho('sì va bene', agent)).toBe(false); }); it('looksLikeEcho: unrelated user speech is not echo', () => { expect(looksLikeEcho('fermati dimmi solo interrotto', 'Sto bene grazie sono pronto ad aiutarti')).toBe(false); @@ -88,21 +98,28 @@ describe('[mocked] commitTranscript echo + dedup', () => { h.forwardSttWhileSpeaking = true; h.isSpeaking = true; h.currentAgentSpokenText = 'ti racconto una storia lunga sul mare'; - expect(h.commitTranscript('una storia lunga')).toBe(false); + expect(h.commitTranscript('ti racconto una storia lunga')).toBe(false); }); it('does NOT drop echo when the flag is off (default)', () => { const h = makeHandler(); h.forwardSttWhileSpeaking = false; h.isSpeaking = true; h.currentAgentSpokenText = 'ti racconto una storia lunga sul mare'; - expect(h.commitTranscript('una storia lunga')).toBe(true); + expect(h.commitTranscript('ti racconto una storia lunga')).toBe(true); }); it('does NOT drop when idle (post-turn user reply)', () => { const h = makeHandler(); h.forwardSttWhileSpeaking = true; h.isSpeaking = false; h.currentAgentSpokenText = 'ti racconto una storia lunga sul mare'; - expect(h.commitTranscript('una storia lunga')).toBe(true); + expect(h.commitTranscript('ti racconto una storia lunga')).toBe(true); + }); + it('does NOT drop a short answer repeating the agent (false-positive guard)', () => { + const h = makeHandler(); + h.forwardSttWhileSpeaking = true; + h.isSpeaking = true; + h.currentAgentSpokenText = "preferisci lunedì o martedì per l'appuntamento"; + expect(h.commitTranscript('lunedì')).toBe(true); }); it('keeps a different follow-up within 500ms (empty-[interrupted]-turn fix)', () => { const h = makeHandler(); From 200f915ba63fd79d32082f7506ac873f690c315a Mon Sep 17 00:00:00 2001 From: nicolotognoni Date: Tue, 9 Jun 2026 00:13:03 +0200 Subject: [PATCH 07/11] =?UTF-8?q?fix(pipeline):=20echo-safe=20barge-in=20f?= =?UTF-8?q?or=20forward-STT=20without=20AEC=20=E2=80=94=20defer=20VAD=20ca?= =?UTF-8?q?ncel=20to=20transcript?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On a no-AEC link with PATTER_FORWARD_STT_WHILE_SPEAKING and no barge_in_strategies, a VAD speech_start during TTS cancelled the turn immediately. But that speech_start is very often the agent's own TTS echo (or pre-first-token line noise on a long tool-running Hermes/OpenClaw turn), so the agent self-interrupted almost every turn: a short normal reply "bene bene" produced agent_text='[interrupted]', and the next turn ran the LLM for seconds yet emitted tts_characters=0 (torn down before its first token). The echo guard only protected the transcript path; the raw VAD-energy cancel had none. Defer the VAD-energy cancel to transcript confirmation whenever forward_stt_while_speaking && aec is None — exactly as it already worked when barge_in_strategies are configured. The speech_start now marks the barge-in PENDING (agent keeps talking); the cancel fires only on a real transcript that survives the echo guard, else the agent resumes after barge_in_confirm_ms (default 1500ms). Default VAD path and forward-STT WITH AEC keep the responsive immediate cancel — no behaviour change for existing configs. Full Python/TS parity. New tests drive the VAD path through on_audio_received / handleAudio: no-AEC+no-strategies defers to pending; AEC on still cancels immediately; a real transcript confirms, an echo transcript does not. --- CHANGELOG.md | 1 + libraries/python/getpatter/stream_handler.py | 56 ++++++---- .../test_pipeline_bargein_backgrounded.py | 103 ++++++++++++++++++ libraries/typescript/src/stream-handler.ts | 16 ++- .../tests/unit/barge-in-two-stage.test.ts | 80 ++++++++++++++ 5 files changed, 234 insertions(+), 22 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 7c09b0e..bc85b86 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -19,6 +19,7 @@ - **Echo guard** — a language-agnostic check (`_looks_like_echo` / `looksLikeEcho`: substring or ≥60% word overlap against the agent's in-flight spoken text) now drops any candidate barge-in/commit that is the agent's own speech echoing back. Active only while forwarding audio during TTS, so the default VAD path and real post-turn replies are untouched. - **Back-to-back dedup fix** — a final within 500 ms of the previous is now dropped only when it is a *near-duplicate* (Deepgram emitting `speech_final` then `is_final` for the same utterance). A genuinely different fast follow-up (e.g. the real interruption right after a suppressed phantom) is kept instead of being silently swallowed into an empty turn. - **Interrupted-turn context rewrite** — on a confirmed mid-turn barge-in the spoken prefix is recorded in history with an `[interrupted by caller]` marker (instead of an ungrounded full reply), so a stateful agent runtime (Hermes/OpenClaw, keyed by `X-Hermes-Session-Id`) sees on the next turn that it was cut off and what the caller actually heard. `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`. +- **Forward-STT-without-AEC no longer self-interrupts on its own echo.** The remaining live Hermes/OpenClaw barge-in failure: with `PATTER_FORWARD_STT_WHILE_SPEAKING` on, no AEC, and no `barge_in_strategies`, a VAD `speech_start` during TTS cancelled the turn immediately — but on a no-AEC link that `speech_start` is very often the agent's *own* TTS echo (or pre-first-token line noise during a long tool-running turn). The result was a cascade of false-positive interruptions: a short normal reply like "bene bene" produced `agent_text='[interrupted]'` with `bargein_ms≈0`, and the next turn's LLM ran for seconds but emitted `tts_characters=0` because it was torn down before its first token. The echo guard existed only on the *transcript* path, so the raw VAD-energy cancel had no protection. The VAD-energy cancel is now **deferred to transcript confirmation** whenever audio is forwarded during TTS without AEC (`forward_stt_while_speaking && aec is None`), exactly as it already was when `barge_in_strategies` are configured: the `speech_start` marks the barge-in *pending* (the agent keeps talking) and the cancel only fires once `_handle_barge_in` / `handleBargeIn` sees a real transcript that survives the echo guard; if none confirms within `barge_in_confirm_ms` (default 1500 ms) the agent resumes its sentence. The default VAD path and forward-STT *with* AEC keep the responsive immediate cancel — no behaviour change for existing configs. For the cleanest short-echo handling, still pair with `echo_cancellation=True` or `barge_in_strategies`. `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`. ## 0.6.5 (2026-06-05) diff --git a/libraries/python/getpatter/stream_handler.py b/libraries/python/getpatter/stream_handler.py index 3d322ed..971992f 100644 --- a/libraries/python/getpatter/stream_handler.py +++ b/libraries/python/getpatter/stream_handler.py @@ -3250,9 +3250,7 @@ async def _filler() -> None: return asyncio.create_task(_filler()) - async def _cancel_long_turn_filler( - self, task: "asyncio.Task | None" - ) -> None: + async def _cancel_long_turn_filler(self, task: "asyncio.Task | None") -> None: """Cancel the long-turn filler task and await its teardown. Idempotent and race-safe: a ``None`` / already-finished task is a no-op, @@ -3269,7 +3267,9 @@ async def _cancel_long_turn_filler( except asyncio.CancelledError: pass except Exception: # pragma: no cover - defensive - logger.debug("long_turn_message filler task ended with error", exc_info=True) + logger.debug( + "long_turn_message filler task ended with error", exc_info=True + ) return None async def _process_streaming_response(self, result, call_id: str) -> str: @@ -3438,9 +3438,7 @@ async def _process_streaming_response(self, result, call_id: str) -> str: sentence = transformed # Real flushed audio about to play — cancel the filler. - long_turn_task = await self._cancel_long_turn_filler( - long_turn_task - ) + long_turn_task = await self._cancel_long_turn_filler(long_turn_task) if not await self._synthesize_sentence( sentence, hook_executor, hook_ctx, first_tts_chunk ): @@ -3758,7 +3756,9 @@ def _commit_transcript(self, text: str) -> bool: and getattr(self, "_is_speaking", False) and _looks_like_echo(text, getattr(self, "_current_agent_spoken_text", "")) ): - logger.debug("Dropped agent-echo transcript (not a user turn): %r", normalised[:40]) + logger.debug( + "Dropped agent-echo transcript (not a user turn): %r", normalised[:40] + ) return False if since_last < 2.0 and normalised == self._last_commit_text: logger.debug( @@ -4158,9 +4158,7 @@ async def on_audio_received(self, audio_bytes: bytes) -> None: # silence bug). After this ``_is_speaking`` is False, so # the if/elif below is a no-op and the frame falls through # to STT. Parity with TS ``endTailGraceForNewTurn``. - if self._is_speaking and getattr( - self, "_tail_grace_active", False - ): + if self._is_speaking and getattr(self, "_tail_grace_active", False): await self._end_tail_grace_for_new_turn() phantom_suppressed = self._is_speaking and not self._can_barge_in() if phantom_suppressed: @@ -4192,13 +4190,31 @@ async def on_audio_received(self, audio_bytes: bytes) -> None: # STT so the user's words are not silently lost. self._suppressed_speech_pending = True elif self._is_speaking: - # Caller spoke over in-flight TTS. With opt-in - # confirmation strategies the cancel is deferred - # until at least one strategy approves the user's - # transcript; otherwise we keep the legacy - # "cancel immediately" path so existing users - # see no behaviour change. - if self._barge_in_strategies: + # Caller spoke over in-flight TTS. The cancel is + # DEFERRED to transcript confirmation — instead of + # firing on raw VAD energy — when EITHER: + # (a) opt-in ``barge_in_strategies`` are configured + # (a strategy must approve the transcript), OR + # (b) we forward STT during TTS WITHOUT AEC. On a + # no-AEC link a VAD ``speech_start`` here is very + # often the agent's OWN TTS echo, not the caller; + # cancelling on it self-interrupts almost every + # turn (the "bene bene" → [interrupted] cascade + # seen on live Hermes/OpenClaw calls). Deferring + # lets ``_handle_barge_in`` run the echo guard on + # the resulting transcript and cancel only on real + # caller speech; if no transcript confirms within + # ``_barge_in_confirm_s`` the pending state times + # out and the agent resumes its sentence. + # Otherwise (default VAD path, or forward-STT WITH AEC + # where the canceller makes VAD trustworthy) the legacy + # immediate cancel runs — existing users see no change. + # Parity with TS speech_start ``deferCancel``. + defer_cancel = bool(self._barge_in_strategies) or ( + getattr(self, "_forward_stt_while_speaking", False) + and getattr(self, "_aec", None) is None + ) + if defer_cancel: await self._start_pending_barge_in() else: if self.metrics is not None: @@ -4233,9 +4249,7 @@ async def on_audio_received(self, audio_bytes: bytes) -> None: # ``OpenAICompatibleLLMProvider.stream``) closes # the request the instant this fires. Parity # with TS ``cancelSpeaking`` → ``llmAbort.abort``. - cancel_event = getattr( - self, "_llm_cancel_event", None - ) + cancel_event = getattr(self, "_llm_cancel_event", None) if cancel_event is not None: cancel_event.set() if not phantom_suppressed and self.metrics is not None: diff --git a/libraries/python/tests/unit/test_pipeline_bargein_backgrounded.py b/libraries/python/tests/unit/test_pipeline_bargein_backgrounded.py index f362e0c..fdf6187 100644 --- a/libraries/python/tests/unit/test_pipeline_bargein_backgrounded.py +++ b/libraries/python/tests/unit/test_pipeline_bargein_backgrounded.py @@ -208,3 +208,106 @@ async def test_flag_on_forwards_to_stt_during_tts(self, monkeypatch) -> None: # leading edge for flush-on-barge-in. assert handler._stt.send_audio.await_count == 2 assert len(handler._inbound_audio_ring) == 2 + + +class _PassthroughAEC: + """Minimal AEC stand-in: marks the link as AEC-protected without altering audio.""" + + def process_near_end(self, pcm: bytes) -> bytes: + return pcm + + +@pytest.mark.unit +@pytest.mark.asyncio +class TestForwardSttDeferredBargeIn: + """On a forward-STT link WITHOUT AEC, a VAD ``speech_start`` during TTS is + very often the agent's own echo. Cancelling on raw VAD energy self-interrupts + almost every turn (live Hermes "bene bene" → [interrupted] cascade). The fix + defers the cancel to a transcript that survives the echo guard. + """ + + def _speaking_handler(self, *, forward: bool, aec) -> PipelineStreamHandler: + handler = _make_handler() + handler._forward_stt_while_speaking = forward + handler._aec = aec + handler._barge_in_strategies = () + handler._auto_vad = _ScriptedVAD([VADEvent(type="speech_start")]) + handler._stt = AsyncMock() + handler._is_speaking = True + handler._tail_grace_active = False + handler._speaking_generation = 1 + handler._speaking_started_at = time.time() - 2.0 + handler._first_audio_sent_at = time.time() - 2.0 + handler._inbound_audio_ring = [] + handler._can_barge_in = lambda: True # type: ignore[assignment] + return handler + + async def test_no_aec_no_strategies_defers_to_pending(self) -> None: + """forward-STT + no AEC + no strategies → VAD speech_start goes PENDING, + does NOT cancel: the agent keeps talking until a real transcript confirms.""" + handler = self._speaking_handler(forward=True, aec=None) + + await handler.on_audio_received(_FRAME) + + # Deferred: no cancel, agent still owns the floor, LLM stream untouched. + assert handler._is_speaking is True + assert handler._barge_in_pending_since is not None + handler.audio_sender.send_clear.assert_not_called() + assert handler._llm_cancel_event.is_set() is False + # Clean up the pending-timeout task so it doesn't outlive the test. + handler._clear_pending_barge_in() + + async def test_with_aec_still_immediate_cancel(self) -> None: + """forward-STT + AEC ON → the canceller makes VAD trustworthy, so the + legacy immediate cancel is preserved (responsive barge-in).""" + handler = self._speaking_handler(forward=True, aec=_PassthroughAEC()) + + await handler.on_audio_received(_FRAME) + + assert handler._is_speaking is False + assert handler._barge_in_pending_since is None + handler.audio_sender.send_clear.assert_awaited() + assert handler._llm_cancel_event.is_set() is True + + async def test_pending_then_real_transcript_cancels(self) -> None: + """After a deferred (pending) VAD barge-in, a real (non-echo) transcript + confirms the cancel via the echo-guarded transcript path.""" + handler = self._speaking_handler(forward=True, aec=None) + handler._current_agent_spoken_text = "sto bene grazie e tu come stai oggi" + + await handler.on_audio_received(_FRAME) + assert handler._barge_in_pending_since is not None # pending + + # A genuinely different caller utterance (not the agent's own words). + await handler._handle_barge_in( + Transcript( + text="fermati e dimmi solo questo", is_final=True, confidence=0.9 + ) + ) + + assert handler._is_speaking is False + assert handler._barge_in_pending_since is None # pending cleared on confirm + handler.audio_sender.send_clear.assert_awaited() + assert handler._llm_cancel_event.is_set() is True + + async def test_pending_then_echo_transcript_does_not_cancel(self) -> None: + """An echo transcript (the agent's own forwarded TTS) must NOT confirm + the pending barge-in — the agent keeps talking.""" + handler = self._speaking_handler(forward=True, aec=None) + handler._current_agent_spoken_text = "sto bene grazie e tu come stai oggi" + + await handler.on_audio_received(_FRAME) + assert handler._barge_in_pending_since is not None # pending + + # Transcript is a fragment of what the agent is currently saying. + await handler._handle_barge_in( + Transcript( + text="sto bene grazie e tu come stai", is_final=True, confidence=0.9 + ) + ) + + # Echo guard dropped it — no cancel, still pending, agent still speaking. + assert handler._is_speaking is True + handler.audio_sender.send_clear.assert_not_called() + assert handler._llm_cancel_event.is_set() is False + handler._clear_pending_barge_in() diff --git a/libraries/typescript/src/stream-handler.ts b/libraries/typescript/src/stream-handler.ts index 7757313..b47ffc0 100644 --- a/libraries/typescript/src/stream-handler.ts +++ b/libraries/typescript/src/stream-handler.ts @@ -1615,7 +1615,21 @@ export class StreamHandler { // agent finishes naturally without a barge-in. this.suppressedSpeechPending = true; } else if (this.isSpeaking) { - if (this.bargeInStrategies.length > 0) { + // Defer the cancel to transcript confirmation — instead of + // firing on raw VAD energy — when EITHER opt-in + // ``bargeInStrategies`` are configured OR we forward STT during + // TTS WITHOUT AEC. On a no-AEC link a VAD ``speech_start`` here + // is very often the agent's OWN echo, and cancelling on it + // self-interrupts almost every turn (the "bene bene" → + // [interrupted] cascade). Deferring lets ``handleBargeIn`` run + // the echo guard on the resulting transcript and cancel only on + // real caller speech; the pending state times out after + // ``bargeInConfirmS`` so the agent resumes if nothing confirms. + // Parity with Python on_audio_received ``defer_cancel``. + const deferCancel = + this.bargeInStrategies.length > 0 || + (this.forwardSttWhileSpeaking && !this.aec); + if (deferCancel) { this.startPendingBargeIn(); this.metricsAcc.anchorUserSpeechStart(); return; diff --git a/libraries/typescript/tests/unit/barge-in-two-stage.test.ts b/libraries/typescript/tests/unit/barge-in-two-stage.test.ts index 2fc35e6..8d7b50d 100644 --- a/libraries/typescript/tests/unit/barge-in-two-stage.test.ts +++ b/libraries/typescript/tests/unit/barge-in-two-stage.test.ts @@ -337,6 +337,86 @@ describe('StreamHandler — handleStop / handleWsClose drops pending barge-in ti }); }); +/** Scripted VAD: returns queued events frame-by-frame, then silence. */ +function makeScriptedVad(events: Array<{ type: string } | null>) { + const queue = [...events]; + return { + async processFrame(): Promise<{ type: string } | null> { + return queue.shift() ?? null; + }, + async close(): Promise {}, + reset(): void {}, + }; +} + +describe('StreamHandler — forward-STT-without-AEC defers VAD-energy barge-in (Hermes/OpenClaw)', () => { + beforeEach(() => { + // Real timers — handleAudio races the VAD promise against a 25 ms timeout. + vi.useRealTimers(); + }); + afterEach(() => { + vi.restoreAllMocks(); + delete process.env.PATTER_FORWARD_STT_WHILE_SPEAKING; + }); + + interface HandleAudioPriv { + isSpeaking: boolean; + bargeInPendingSince: number | null; + forwardSttWhileSpeaking: boolean; + aec: unknown; + autoVad: unknown; + stt: unknown; + inboundAudioRing: Buffer[]; + canBargeIn: () => boolean; + clearPendingBargeIn: () => void; + } + + function armForwardStt( + h: StreamHandler, + aec: unknown, + ): HandleAudioPriv { + armSpeakingState(h); + const p = h as unknown as HandleAudioPriv; + p.stt = { sendAudio: vi.fn() }; + p.autoVad = makeScriptedVad([{ type: 'speech_start' }]); + p.forwardSttWhileSpeaking = true; + p.aec = aec; + p.inboundAudioRing = []; + p.canBargeIn = () => true; + return p; + } + + it('no AEC + no strategies → VAD speech_start DEFERS to pending (no immediate cancel)', async () => { + const deps = makeDeps([]); // legacy config — no opt-in strategies + const h = new StreamHandler(deps, makeMockWs(), '+1', '+2'); + const p = armForwardStt(h, null); + + await h.handleAudio(Buffer.alloc(160)); // 20 ms mulaw frame + + // Deferred: the agent keeps the floor; the cancel waits for a transcript + // that survives the echo guard (the "bene bene" → [interrupted] fix). + expect(p.isSpeaking).toBe(true); + expect(p.bargeInPendingSince).not.toBeNull(); + expect(deps.bridge.sendClear).not.toHaveBeenCalled(); + p.clearPendingBargeIn(); + }); + + it('AEC ON → VAD speech_start still cancels immediately (canceller makes VAD trustworthy)', async () => { + const deps = makeDeps([]); + const h = new StreamHandler(deps, makeMockWs(), '+1', '+2'); + const p = armForwardStt(h, { + processNearEnd: (b: Buffer) => b, + pushFarEnd: () => {}, + }); + + await h.handleAudio(Buffer.alloc(160)); + + expect(p.isSpeaking).toBe(false); + expect(p.bargeInPendingSince).toBeNull(); + expect(deps.bridge.sendClear).toHaveBeenCalled(); + }); +}); + describe('MinWordsStrategy threshold parity (TS↔Py)', () => { it.each([2, 3, 5])( 'agent stays talking below threshold and cancels at threshold (minWords=%i)', From d06bf6294b853f0d0f6ba3b4ee1263df79e6057f Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 9 Jun 2026 07:24:21 +0000 Subject: [PATCH 08/11] =?UTF-8?q?feat(hermes):=20zero-config=20CLI=20?= =?UTF-8?q?=E2=80=94=20`patter=20hermes`=20doctor=20/=20setup=20/=20attach?= =?UTF-8?q?-number=20+=20example=20app?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Make standing up the Hermes voice shell (Direction A) copy-paste simple, on par with wiring a hosted custom-LLM voice agent but keeping Hermes on loopback. New `patter hermes` CLI group (Python): - `doctor` — preflight across the Hermes gateway (/v1/models reachability + model presence), the Patter providers (HermesLLM constructible, Deepgram / ElevenLabs keys, ElevenLabs transport, Silero VAD), and the Twilio carrier (creds valid, number webhook). Each problem prints a suggested fix. `--no-network` skips live probes, `--json` for machine-readable output. - `setup` — scaffold a ready-to-run hermes-phone-agent project, run the checks, optionally attach a Twilio number (`--number`/`--url`). Non-interactive with `--yes`. - `attach-number` / `numbers` — point a Twilio number's voice webhook at your Patter URL / list account numbers. Scaffold (`getpatter/_hermes_scaffold.py`) is the single source of truth for the committed `examples/hermes-phone-agent/` project (app.py, .env.example, README, docker-compose, doctor/text-turn/outbound-call scripts); a test keeps them in sync. The example defaults to REST ElevenLabs TTS and caller-hash memory. TS CLI gains a `hermes` stub pointing to the Python wizard (mirrors the `eval` stub); the HermesLLM provider stays available in both SDKs. Docs updated with a zero-config setup section. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts --- docs/integrations/hermes.mdx | 28 + examples/hermes-phone-agent/.env.example | 21 + examples/hermes-phone-agent/README.md | 59 ++ examples/hermes-phone-agent/app.py | 60 ++ .../hermes-phone-agent/docker-compose.yml | 14 + examples/hermes-phone-agent/scripts/doctor.py | 7 + .../scripts/test_outbound_call.py | 50 ++ .../scripts/test_text_turn.py | 41 ++ .../python/getpatter/_hermes_scaffold.py | 333 +++++++++ libraries/python/getpatter/cli.py | 7 + libraries/python/getpatter/cli_hermes.py | 649 ++++++++++++++++++ .../python/tests/unit/test_hermes_cli.py | 184 +++++ libraries/typescript/src/cli.ts | 18 + 13 files changed, 1471 insertions(+) create mode 100644 examples/hermes-phone-agent/.env.example create mode 100644 examples/hermes-phone-agent/README.md create mode 100644 examples/hermes-phone-agent/app.py create mode 100644 examples/hermes-phone-agent/docker-compose.yml create mode 100644 examples/hermes-phone-agent/scripts/doctor.py create mode 100644 examples/hermes-phone-agent/scripts/test_outbound_call.py create mode 100644 examples/hermes-phone-agent/scripts/test_text_turn.py create mode 100644 libraries/python/getpatter/_hermes_scaffold.py create mode 100644 libraries/python/getpatter/cli_hermes.py create mode 100644 libraries/python/tests/unit/test_hermes_cli.py diff --git a/docs/integrations/hermes.mdx b/docs/integrations/hermes.mdx index 006ed0f..160c48c 100644 --- a/docs/integrations/hermes.mdx +++ b/docs/integrations/hermes.mdx @@ -164,6 +164,34 @@ gateway that isn't listening. `hermes-agent`). +### Zero-config setup (Python) + +If you'd rather not wire it up by hand, the Python CLI scaffolds a ready-to-run project, +checks your environment, and can point your Twilio number at Patter: + +```bash +pip install getpatter + +patter hermes doctor # preflight: gateway, providers, carrier — with fixes +patter hermes setup # scaffold ./hermes-phone-agent (app.py, .env, scripts) +``` + +`patter hermes doctor` probes the Hermes gateway (`/v1/models`), confirms `HermesLLM` is +constructible, checks your Deepgram / ElevenLabs / Twilio credentials, and prints a +suggested fix for anything missing (`--no-network` skips live probes, `--json` for +machine-readable output). `patter hermes setup` writes the same starter project shown in +[`examples/hermes-phone-agent`](https://github.com/PatterAI/Patter/tree/main/examples/hermes-phone-agent) +and, given `--number` and `--url`, attaches the Twilio webhook for you. To wire an existing +number on its own: + +```bash +patter hermes numbers # list the numbers on your Twilio account +patter hermes attach-number +15551234567 --url https:///calls/inbound +``` + +These commands live in the Python SDK today; the `HermesLLM` provider itself is available +in both the Python and TypeScript SDKs. + ### Running Patter locally Build a pipeline-mode agent whose LLM is `HermesLLM`. Patter wraps the carrier, STT, and diff --git a/examples/hermes-phone-agent/.env.example b/examples/hermes-phone-agent/.env.example new file mode 100644 index 0000000..de760cf --- /dev/null +++ b/examples/hermes-phone-agent/.env.example @@ -0,0 +1,21 @@ +# ── Hermes gateway (the brain — keep it on loopback) ────────────────── +API_SERVER_ENABLED=true +API_SERVER_HOST=127.0.0.1 +API_SERVER_PORT=8642 +API_SERVER_KEY=choose-a-strong-key +API_SERVER_MODEL_NAME=hermes-agent + +# ── Patter (the voice shell) ────────────────────────────────────────── +PATTER_PHONE_NUMBER=+15551234567 +PATTER_LANGUAGE=en +# REST is the safer default for a first PSTN demo; set to ws for streaming. +PATTER_ELEVENLABS_TRANSPORT=rest + +# ── Twilio carrier ──────────────────────────────────────────────────── +TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +TWILIO_AUTH_TOKEN=your-twilio-auth-token + +# ── STT / TTS providers ─────────────────────────────────────────────── +DEEPGRAM_API_KEY=your-deepgram-key +ELEVENLABS_API_KEY=your-elevenlabs-key +# ELEVENLABS_VOICE_ID=EXAVITQu4vr4xnSDxMaL diff --git a/examples/hermes-phone-agent/README.md b/examples/hermes-phone-agent/README.md new file mode 100644 index 0000000..3b5298c --- /dev/null +++ b/examples/hermes-phone-agent/README.md @@ -0,0 +1,59 @@ +# Hermes phone agent + +A self-hosted phone line for your [Hermes Agent](https://github.com/NousResearch/hermes-agent). +Patter is the **voice shell** (carrier, speech-to-text, turn-taking, barge-in, +text-to-speech); Hermes is the **brain** on the line. Each conversation turn is +one `POST http://127.0.0.1:8642/v1/chat/completions` against your local Hermes +gateway — so Hermes keeps its tools, memory, and skills, and **never leaves +loopback**. The only thing exposed to the internet is Patter's carrier webhook. + +## 1. Configure + +```bash +cp .env.example .env +# then fill in API_SERVER_KEY, TWILIO_*, DEEPGRAM_API_KEY, ELEVENLABS_API_KEY, +# and PATTER_PHONE_NUMBER +``` + +## 2. Check everything is wired up + +```bash +pip install getpatter +patter hermes doctor +``` + +Fix anything it flags (it prints a suggested command for each problem), then +smoke-test the brain without spending a phone call: + +```bash +python scripts/test_text_turn.py "say hello in one sentence" +``` + +## 3. Answer the phone + +```bash +python app.py +``` + +Patter opens a tunnel and prints the public webhook URL. Point your Twilio +number's voice webhook at it — or let Patter do it for you: + +```bash +patter hermes attach-number "$PATTER_PHONE_NUMBER" --url https:///calls/inbound +``` + +Now call your number and talk to Hermes. + +## 4. Place an outbound call (optional) + +```bash +python scripts/test_outbound_call.py +15557654321 +``` + +## Why Patter instead of a hosted custom-LLM voice agent? + +- **Hermes stays private.** A hosted platform has to reach your "brain" endpoint + over the public internet; here Hermes is loopback-only and only Patter is + exposed. +- **You own the voice layer** — STT, turn-taking, barge-in, TTS — and can script it. +- **Inbound *and* outbound**, plus the Patter MCP server so Hermes can place calls. diff --git a/examples/hermes-phone-agent/app.py b/examples/hermes-phone-agent/app.py new file mode 100644 index 0000000..f939d78 --- /dev/null +++ b/examples/hermes-phone-agent/app.py @@ -0,0 +1,60 @@ +"""Hermes phone agent — Patter is the voice shell, Hermes is the brain. + +A caller dials your number, Patter answers (carrier + STT + turn-taking + TTS), +and every conversation turn is routed to your local Hermes gateway as the LLM. +Hermes stays on loopback (127.0.0.1:8642); only Patter's carrier webhook is +exposed to the internet, via the tunnel. + +Run: + python app.py + +Check your setup first with: + patter hermes doctor +""" + +from __future__ import annotations + +import os + +from getpatter import ( + DeepgramSTT, + ElevenLabsRestTTS, + ElevenLabsTTS, + HermesLLM, + Patter, + Twilio, +) + +# REST TTS is the safer default for a first PSTN demo: there is no long-lived +# WebSocket that can stall before the first audio frame. Set +# PATTER_ELEVENLABS_TRANSPORT=ws to opt into streaming once the basics work. +if os.environ.get("PATTER_ELEVENLABS_TRANSPORT", "rest").lower() == "ws": + tts = ElevenLabsTTS.for_twilio() +else: + tts = ElevenLabsRestTTS.for_twilio() + +phone = Patter( + carrier=Twilio(), # TWILIO_ACCOUNT_SID / TWILIO_AUTH_TOKEN + phone_number=os.environ["PATTER_PHONE_NUMBER"], + tunnel=True, # auto Cloudflare quick-tunnel (local dev) +) + +agent = phone.agent( + system_prompt=( + "You are Hermes on a live phone call. Keep replies concise, warm, and " + "spoken-friendly. Avoid markdown, code blocks, long lists, and URLs " + "unless the caller asks. If you use a tool, say you are checking, then " + "summarize the result naturally. If interrupted, stop and answer the " + "latest request." + ), + language=os.environ.get("PATTER_LANGUAGE", "en"), + first_message="Hello, this is Hermes. How can I help?", + stt=DeepgramSTT(), # DEEPGRAM_API_KEY + llm=HermesLLM(session_key_from="caller_hash"), # http://127.0.0.1:8642/v1 + tts=tts, # ELEVENLABS_API_KEY + long_turn_message="One moment, let me check that.", + llm_error_message="Sorry, I'm having trouble reaching Hermes right now.", +) + +if __name__ == "__main__": + phone.serve(agent) # answers inbound calls diff --git a/examples/hermes-phone-agent/docker-compose.yml b/examples/hermes-phone-agent/docker-compose.yml new file mode 100644 index 0000000..cffcb9b --- /dev/null +++ b/examples/hermes-phone-agent/docker-compose.yml @@ -0,0 +1,14 @@ +# Patter + Hermes on one box. Hermes stays on loopback; only Patter is exposed. +# +# This runs the Patter voice shell in a container that shares the host network +# so it can reach the Hermes gateway on 127.0.0.1:8642. Start your Hermes +# gateway on the host first (see the Hermes docs), then `docker compose up`. +services: + patter: + image: python:3.12-slim + working_dir: /app + env_file: .env + network_mode: host # so Patter reaches Hermes on 127.0.0.1:8642 + volumes: + - .:/app + command: sh -c "pip install --quiet getpatter && python app.py" diff --git a/examples/hermes-phone-agent/scripts/doctor.py b/examples/hermes-phone-agent/scripts/doctor.py new file mode 100644 index 0000000..0f13930 --- /dev/null +++ b/examples/hermes-phone-agent/scripts/doctor.py @@ -0,0 +1,7 @@ +#!/usr/bin/env python +"""Run the Patter Hermes preflight checks (wraps `patter hermes doctor`).""" + +import subprocess +import sys + +raise SystemExit(subprocess.call(["patter", "hermes", "doctor", *sys.argv[1:]])) diff --git a/examples/hermes-phone-agent/scripts/test_outbound_call.py b/examples/hermes-phone-agent/scripts/test_outbound_call.py new file mode 100644 index 0000000..1c35f9b --- /dev/null +++ b/examples/hermes-phone-agent/scripts/test_outbound_call.py @@ -0,0 +1,50 @@ +#!/usr/bin/env python +"""Place a test outbound call through the Hermes voice shell. + + python scripts/test_outbound_call.py +15557654321 + +The callee picks up and talks to Hermes. Requires the same env as app.py. +""" + +from __future__ import annotations + +import asyncio +import os +import sys + +from getpatter import ( + DeepgramSTT, + ElevenLabsRestTTS, + HermesLLM, + Patter, + Twilio, +) + + +async def main() -> int: + if len(sys.argv) < 2: + print("Usage: python scripts/test_outbound_call.py <+E164>") + return 2 + to = sys.argv[1] + phone = Patter( + carrier=Twilio(), + phone_number=os.environ["PATTER_PHONE_NUMBER"], + tunnel=True, + ) + agent = phone.agent( + system_prompt=( + "You are Hermes on a short test call. Greet the person warmly and " + "ask how they are. Keep it brief and spoken-friendly." + ), + first_message="Hi, this is a Patter and Hermes test call.", + stt=DeepgramSTT(), + llm=HermesLLM(session_key_from="caller_hash"), + tts=ElevenLabsRestTTS.for_twilio(), + ) + result = await phone.call(to, agent=agent, wait=True) + print(f"Call outcome: {result.outcome if result else 'unknown'}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(asyncio.run(main())) diff --git a/examples/hermes-phone-agent/scripts/test_text_turn.py b/examples/hermes-phone-agent/scripts/test_text_turn.py new file mode 100644 index 0000000..ddda492 --- /dev/null +++ b/examples/hermes-phone-agent/scripts/test_text_turn.py @@ -0,0 +1,41 @@ +#!/usr/bin/env python +"""Text-only smoke test: send one turn to the local Hermes gateway. + +Verifies the brain answers before you spend a phone call debugging it. Reads +API_SERVER_PORT / API_SERVER_MODEL_NAME / API_SERVER_KEY from the environment. + + python scripts/test_text_turn.py "say hello in one short sentence" +""" + +from __future__ import annotations + +import json +import os +import sys +import urllib.request + +base = f"http://127.0.0.1:{os.environ.get('API_SERVER_PORT', '8642')}/v1" +model = os.environ.get("API_SERVER_MODEL_NAME", "hermes-agent") +key = os.environ.get("API_SERVER_KEY", "") +prompt = " ".join(sys.argv[1:]) or "Say hello in one short sentence." + +headers = {"Content-Type": "application/json"} +if key: + headers["Authorization"] = f"Bearer {key}" + +req = urllib.request.Request( + f"{base}/chat/completions", + data=json.dumps( + {"model": model, "messages": [{"role": "user", "content": prompt}]} + ).encode(), + headers=headers, +) + +try: + with urllib.request.urlopen(req, timeout=120) as resp: # noqa: S310 + data = json.load(resp) +except Exception as exc: # noqa: BLE001 + print(f"Hermes did not answer: {exc}") + raise SystemExit(1) + +print(data["choices"][0]["message"]["content"]) diff --git a/libraries/python/getpatter/_hermes_scaffold.py b/libraries/python/getpatter/_hermes_scaffold.py new file mode 100644 index 0000000..fe9b7a5 --- /dev/null +++ b/libraries/python/getpatter/_hermes_scaffold.py @@ -0,0 +1,333 @@ +"""Project scaffold for the Hermes phone agent. + +Single source of truth for the ``hermes-phone-agent`` starter project. The +``patter hermes setup`` wizard (see :mod:`getpatter.cli_hermes`) writes these +files for a user, and the committed ``examples/hermes-phone-agent/`` tree is +generated from the same :data:`FILES` map (a test asserts they stay in sync). + +Each entry maps a project-relative path to its file contents. Keep the contents +runnable against the real public API — they double as the example the docs +point at. +""" + +from __future__ import annotations + +from pathlib import Path + +__all__ = ["FILES", "scaffold"] + + +_APP_PY = '''\ +"""Hermes phone agent — Patter is the voice shell, Hermes is the brain. + +A caller dials your number, Patter answers (carrier + STT + turn-taking + TTS), +and every conversation turn is routed to your local Hermes gateway as the LLM. +Hermes stays on loopback (127.0.0.1:8642); only Patter's carrier webhook is +exposed to the internet, via the tunnel. + +Run: + python app.py + +Check your setup first with: + patter hermes doctor +""" + +from __future__ import annotations + +import os + +from getpatter import ( + DeepgramSTT, + ElevenLabsRestTTS, + ElevenLabsTTS, + HermesLLM, + Patter, + Twilio, +) + +# REST TTS is the safer default for a first PSTN demo: there is no long-lived +# WebSocket that can stall before the first audio frame. Set +# PATTER_ELEVENLABS_TRANSPORT=ws to opt into streaming once the basics work. +if os.environ.get("PATTER_ELEVENLABS_TRANSPORT", "rest").lower() == "ws": + tts = ElevenLabsTTS.for_twilio() +else: + tts = ElevenLabsRestTTS.for_twilio() + +phone = Patter( + carrier=Twilio(), # TWILIO_ACCOUNT_SID / TWILIO_AUTH_TOKEN + phone_number=os.environ["PATTER_PHONE_NUMBER"], + tunnel=True, # auto Cloudflare quick-tunnel (local dev) +) + +agent = phone.agent( + system_prompt=( + "You are Hermes on a live phone call. Keep replies concise, warm, and " + "spoken-friendly. Avoid markdown, code blocks, long lists, and URLs " + "unless the caller asks. If you use a tool, say you are checking, then " + "summarize the result naturally. If interrupted, stop and answer the " + "latest request." + ), + language=os.environ.get("PATTER_LANGUAGE", "en"), + first_message="Hello, this is Hermes. How can I help?", + stt=DeepgramSTT(), # DEEPGRAM_API_KEY + llm=HermesLLM(session_key_from="caller_hash"), # http://127.0.0.1:8642/v1 + tts=tts, # ELEVENLABS_API_KEY + long_turn_message="One moment, let me check that.", + llm_error_message="Sorry, I'm having trouble reaching Hermes right now.", +) + +if __name__ == "__main__": + phone.serve(agent) # answers inbound calls +''' + + +_ENV_EXAMPLE = """\ +# ── Hermes gateway (the brain — keep it on loopback) ────────────────── +API_SERVER_ENABLED=true +API_SERVER_HOST=127.0.0.1 +API_SERVER_PORT=8642 +API_SERVER_KEY=choose-a-strong-key +API_SERVER_MODEL_NAME=hermes-agent + +# ── Patter (the voice shell) ────────────────────────────────────────── +PATTER_PHONE_NUMBER=+15551234567 +PATTER_LANGUAGE=en +# REST is the safer default for a first PSTN demo; set to ws for streaming. +PATTER_ELEVENLABS_TRANSPORT=rest + +# ── Twilio carrier ──────────────────────────────────────────────────── +TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +TWILIO_AUTH_TOKEN=your-twilio-auth-token + +# ── STT / TTS providers ─────────────────────────────────────────────── +DEEPGRAM_API_KEY=your-deepgram-key +ELEVENLABS_API_KEY=your-elevenlabs-key +# ELEVENLABS_VOICE_ID=EXAVITQu4vr4xnSDxMaL +""" + + +_README = """\ +# Hermes phone agent + +A self-hosted phone line for your [Hermes Agent](https://github.com/NousResearch/hermes-agent). +Patter is the **voice shell** (carrier, speech-to-text, turn-taking, barge-in, +text-to-speech); Hermes is the **brain** on the line. Each conversation turn is +one `POST http://127.0.0.1:8642/v1/chat/completions` against your local Hermes +gateway — so Hermes keeps its tools, memory, and skills, and **never leaves +loopback**. The only thing exposed to the internet is Patter's carrier webhook. + +## 1. Configure + +```bash +cp .env.example .env +# then fill in API_SERVER_KEY, TWILIO_*, DEEPGRAM_API_KEY, ELEVENLABS_API_KEY, +# and PATTER_PHONE_NUMBER +``` + +## 2. Check everything is wired up + +```bash +pip install getpatter +patter hermes doctor +``` + +Fix anything it flags (it prints a suggested command for each problem), then +smoke-test the brain without spending a phone call: + +```bash +python scripts/test_text_turn.py "say hello in one sentence" +``` + +## 3. Answer the phone + +```bash +python app.py +``` + +Patter opens a tunnel and prints the public webhook URL. Point your Twilio +number's voice webhook at it — or let Patter do it for you: + +```bash +patter hermes attach-number "$PATTER_PHONE_NUMBER" --url https:///calls/inbound +``` + +Now call your number and talk to Hermes. + +## 4. Place an outbound call (optional) + +```bash +python scripts/test_outbound_call.py +15557654321 +``` + +## Why Patter instead of a hosted custom-LLM voice agent? + +- **Hermes stays private.** A hosted platform has to reach your "brain" endpoint + over the public internet; here Hermes is loopback-only and only Patter is + exposed. +- **You own the voice layer** — STT, turn-taking, barge-in, TTS — and can script it. +- **Inbound *and* outbound**, plus the Patter MCP server so Hermes can place calls. +""" + + +_DOCKER_COMPOSE = """\ +# Patter + Hermes on one box. Hermes stays on loopback; only Patter is exposed. +# +# This runs the Patter voice shell in a container that shares the host network +# so it can reach the Hermes gateway on 127.0.0.1:8642. Start your Hermes +# gateway on the host first (see the Hermes docs), then `docker compose up`. +services: + patter: + image: python:3.12-slim + working_dir: /app + env_file: .env + network_mode: host # so Patter reaches Hermes on 127.0.0.1:8642 + volumes: + - .:/app + command: sh -c "pip install --quiet getpatter && python app.py" +""" + + +_SCRIPT_DOCTOR = '''\ +#!/usr/bin/env python +"""Run the Patter Hermes preflight checks (wraps `patter hermes doctor`).""" + +import subprocess +import sys + +raise SystemExit(subprocess.call(["patter", "hermes", "doctor", *sys.argv[1:]])) +''' + + +_SCRIPT_TEXT_TURN = '''\ +#!/usr/bin/env python +"""Text-only smoke test: send one turn to the local Hermes gateway. + +Verifies the brain answers before you spend a phone call debugging it. Reads +API_SERVER_PORT / API_SERVER_MODEL_NAME / API_SERVER_KEY from the environment. + + python scripts/test_text_turn.py "say hello in one short sentence" +""" + +from __future__ import annotations + +import json +import os +import sys +import urllib.request + +base = f"http://127.0.0.1:{os.environ.get('API_SERVER_PORT', '8642')}/v1" +model = os.environ.get("API_SERVER_MODEL_NAME", "hermes-agent") +key = os.environ.get("API_SERVER_KEY", "") +prompt = " ".join(sys.argv[1:]) or "Say hello in one short sentence." + +headers = {"Content-Type": "application/json"} +if key: + headers["Authorization"] = f"Bearer {key}" + +req = urllib.request.Request( + f"{base}/chat/completions", + data=json.dumps( + {"model": model, "messages": [{"role": "user", "content": prompt}]} + ).encode(), + headers=headers, +) + +try: + with urllib.request.urlopen(req, timeout=120) as resp: # noqa: S310 + data = json.load(resp) +except Exception as exc: # noqa: BLE001 + print(f"Hermes did not answer: {exc}") + raise SystemExit(1) + +print(data["choices"][0]["message"]["content"]) +''' + + +_SCRIPT_OUTBOUND = '''\ +#!/usr/bin/env python +"""Place a test outbound call through the Hermes voice shell. + + python scripts/test_outbound_call.py +15557654321 + +The callee picks up and talks to Hermes. Requires the same env as app.py. +""" + +from __future__ import annotations + +import asyncio +import os +import sys + +from getpatter import ( + DeepgramSTT, + ElevenLabsRestTTS, + HermesLLM, + Patter, + Twilio, +) + + +async def main() -> int: + if len(sys.argv) < 2: + print("Usage: python scripts/test_outbound_call.py <+E164>") + return 2 + to = sys.argv[1] + phone = Patter( + carrier=Twilio(), + phone_number=os.environ["PATTER_PHONE_NUMBER"], + tunnel=True, + ) + agent = phone.agent( + system_prompt=( + "You are Hermes on a short test call. Greet the person warmly and " + "ask how they are. Keep it brief and spoken-friendly." + ), + first_message="Hi, this is a Patter and Hermes test call.", + stt=DeepgramSTT(), + llm=HermesLLM(session_key_from="caller_hash"), + tts=ElevenLabsRestTTS.for_twilio(), + ) + result = await phone.call(to, agent=agent, wait=True) + print(f"Call outcome: {result.outcome if result else 'unknown'}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(asyncio.run(main())) +''' + + +# Project-relative path -> file contents. The committed +# ``examples/hermes-phone-agent/`` tree is generated from this map. +FILES: dict[str, str] = { + "app.py": _APP_PY, + ".env.example": _ENV_EXAMPLE, + "README.md": _README, + "docker-compose.yml": _DOCKER_COMPOSE, + "scripts/doctor.py": _SCRIPT_DOCTOR, + "scripts/test_text_turn.py": _SCRIPT_TEXT_TURN, + "scripts/test_outbound_call.py": _SCRIPT_OUTBOUND, +} + + +def scaffold(target_dir: Path | str, *, force: bool = False) -> list[Path]: + """Write the project files under ``target_dir``. + + Args: + target_dir: Destination directory (created if missing). + force: Overwrite existing files. When ``False`` (default), existing + files are left untouched and skipped. + + Returns: + The list of paths that were written (skipped files are excluded). + """ + root = Path(target_dir) + written: list[Path] = [] + for rel, content in FILES.items(): + dest = root / rel + if dest.exists() and not force: + continue + dest.parent.mkdir(parents=True, exist_ok=True) + dest.write_text(content, encoding="utf-8") + written.append(dest) + return written diff --git a/libraries/python/getpatter/cli.py b/libraries/python/getpatter/cli.py index 09ea2d7..204a061 100644 --- a/libraries/python/getpatter/cli.py +++ b/libraries/python/getpatter/cli.py @@ -32,12 +32,19 @@ def main() -> None: build_eval_parser(subparsers) + # patter hermes {doctor|setup|attach-number|numbers} + from getpatter.cli_hermes import build_hermes_parser, dispatch_hermes + + build_hermes_parser(subparsers) + args = parser.parse_args() if args.command == "dashboard": asyncio.run(_run_dashboard(args.port)) elif args.command == "eval": sys.exit(dispatch_eval(args)) + elif args.command == "hermes": + sys.exit(dispatch_hermes(args)) else: parser.print_help() sys.exit(1) diff --git a/libraries/python/getpatter/cli_hermes.py b/libraries/python/getpatter/cli_hermes.py new file mode 100644 index 0000000..1c3c603 --- /dev/null +++ b/libraries/python/getpatter/cli_hermes.py @@ -0,0 +1,649 @@ +"""``patter hermes ...`` — zero-config setup, diagnostics, and Twilio wiring +for the Hermes voice shell (Direction A: Patter is the voice, Hermes is the +brain). + +Subcommands: + +* ``patter hermes doctor`` — preflight checks across the Hermes gateway, the + Patter providers, and the carrier, each with a suggested fix. +* ``patter hermes setup`` — scaffold a ready-to-run ``hermes-phone-agent`` + project, run the checks, and optionally attach a Twilio number. +* ``patter hermes attach-number`` — point a Twilio number's voice webhook at + your Patter URL. +* ``patter hermes numbers`` — list the Twilio numbers on your account. + +Live probes (gateway / Twilio API) are best-effort and time-bounded; pass +``--no-network`` to skip them. Nothing is mutated unless you ask for it +(``setup`` prompts before writing; ``attach-number`` is an explicit command). +""" + +from __future__ import annotations + +import argparse +import importlib.util +import json +import os +import shutil +import sys +from dataclasses import dataclass, field +from pathlib import Path + +# Check statuses. +OK = "ok" +WARN = "warn" +FAIL = "fail" +SKIP = "skip" + +_SYMBOL = {OK: "✓", WARN: "!", FAIL: "✗", SKIP: "·"} + + +def _color(text: str, status: str) -> str: + """Colorize a status symbol unless NO_COLOR is set or output isn't a tty.""" + if os.environ.get("NO_COLOR") or not sys.stdout.isatty(): + return text + code = {OK: "32", WARN: "33", FAIL: "31", SKIP: "90"}.get(status, "0") + return f"\033[{code}m{text}\033[0m" + + +@dataclass +class Check: + """One diagnostic result.""" + + status: str + label: str + detail: str = "" + fix: str = "" + + +@dataclass +class Section: + """A named group of checks.""" + + title: str + checks: list[Check] = field(default_factory=list) + + +# ────────────────────────────────────────────────────────────────────────── +# Hermes gateway base URL resolution (mirrors HermesLLM defaults) +# ────────────────────────────────────────────────────────────────────────── +def _hermes_base_url(override: str | None) -> str: + if override: + return override.rstrip("/") + host = os.environ.get("API_SERVER_HOST", "127.0.0.1") + port = os.environ.get("API_SERVER_PORT", "8642") + return f"http://{host}:{port}/v1" + + +def _get_json(url: str, *, headers: dict | None = None, timeout: float = 4.0): + """Best-effort sync GET returning ``(status_code, json_or_none, error)``.""" + try: + import httpx + except ImportError: # pragma: no cover - httpx is a core dep + return None, None, "httpx not installed" + try: + resp = httpx.get(url, headers=headers or {}, timeout=timeout) + except Exception as exc: # noqa: BLE001 - surface any connection failure + return None, None, str(exc) + try: + body = resp.json() + except Exception: # noqa: BLE001 - non-JSON body + body = None + return resp.status_code, body, "" + + +# ────────────────────────────────────────────────────────────────────────── +# Check groups +# ────────────────────────────────────────────────────────────────────────── +def _check_hermes(base_url: str, *, network: bool) -> Section: + sec = Section("Hermes") + + if shutil.which("hermes"): + sec.checks.append(Check(OK, "CLI found", "hermes on PATH")) + else: + sec.checks.append( + Check( + WARN, + "CLI not found", + "optional — only the gateway is required", + "install Hermes: https://github.com/NousResearch/hermes-agent", + ) + ) + + key = os.environ.get("API_SERVER_KEY", "") + if key: + sec.checks.append(Check(OK, "API_SERVER_KEY set")) + else: + sec.checks.append( + Check( + WARN, + "API_SERVER_KEY not set", + "keyless local gateways work, but a key is recommended", + 'export API_SERVER_KEY="choose-a-strong-key"', + ) + ) + + if not network: + sec.checks.append(Check(SKIP, "Gateway reachable", "skipped (--no-network)")) + return sec + + headers = {"Authorization": f"Bearer {key}"} if key else {} + status, body, err = _get_json(f"{base_url}/models", headers=headers) + if status == 200: + sec.checks.append(Check(OK, "Gateway reachable", base_url)) + want = os.environ.get("API_SERVER_MODEL_NAME", "hermes-agent") + ids = _model_ids(body) + if want in ids: + sec.checks.append(Check(OK, "Model available", want)) + elif ids: + sec.checks.append( + Check( + WARN, + "Model not found", + f"{want!r} missing; saw {', '.join(sorted(ids)[:5])}", + f'set API_SERVER_MODEL_NAME to one of the served models', + ) + ) + else: + sec.checks.append( + Check(WARN, "Model list empty", "gateway returned no models") + ) + elif status in (401, 403): + sec.checks.append( + Check( + FAIL, + "Gateway rejected key", + f"HTTP {status} from {base_url}/models", + "check API_SERVER_KEY matches the gateway", + ) + ) + else: + detail = f"HTTP {status}" if status else (err or "no response") + sec.checks.append( + Check( + FAIL, + "Gateway unreachable", + f"{base_url} — {detail}", + "enable + start the gateway (API_SERVER_ENABLED=true), " + "or pass --base-url", + ) + ) + return sec + + +def _model_ids(body) -> set[str]: + """Extract model ids from an OpenAI-style ``/models`` payload.""" + if not isinstance(body, dict): + return set() + data = body.get("data") + if not isinstance(data, list): + return set() + return {m.get("id") for m in data if isinstance(m, dict) and m.get("id")} + + +def _check_patter() -> Section: + sec = Section("Patter") + + try: + from getpatter import __version__ + + sec.checks.append(Check(OK, "getpatter installed", __version__)) + except Exception as exc: # noqa: BLE001 + sec.checks.append(Check(FAIL, "getpatter import failed", str(exc))) + return sec + + try: + from getpatter import HermesLLM + + HermesLLM() + sec.checks.append(Check(OK, "HermesLLM constructible")) + except Exception as exc: # noqa: BLE001 + sec.checks.append(Check(FAIL, "HermesLLM construction failed", str(exc))) + + sec.checks.append(_env_key("DEEPGRAM_API_KEY", "Deepgram STT")) + sec.checks.append(_env_key("ELEVENLABS_API_KEY", "ElevenLabs TTS")) + + transport = os.environ.get("PATTER_ELEVENLABS_TRANSPORT", "").lower() + if transport == "rest": + sec.checks.append(Check(OK, "ElevenLabs transport", "rest")) + elif transport == "ws": + sec.checks.append( + Check( + WARN, + "ElevenLabs transport", + "ws — WebSocket can stall before the first frame on PSTN", + "PATTER_ELEVENLABS_TRANSPORT=rest for a more robust demo", + ) + ) + else: + sec.checks.append( + Check(OK, "ElevenLabs transport", "unset (example defaults to REST)") + ) + + if importlib.util.find_spec("onnxruntime") is not None: + sec.checks.append(Check(OK, "Silero VAD available")) + else: + sec.checks.append( + Check( + WARN, + "Silero VAD missing", + "only needed for the pipeline VAD", + 'pip install "getpatter[silero]"', + ) + ) + return sec + + +def _env_key(var: str, label: str) -> Check: + if os.environ.get(var): + return Check(OK, f"{label} key found") + return Check( + WARN, + f"{label} key missing", + f"{var} not set", + f"export {var}=...", + ) + + +def _check_twilio(*, network: bool) -> Section: + sec = Section("Twilio") + sid = os.environ.get("TWILIO_ACCOUNT_SID", "") + token = os.environ.get("TWILIO_AUTH_TOKEN", "") + if not sid or not token: + sec.checks.append( + Check( + WARN, + "Carrier credentials missing", + "TWILIO_ACCOUNT_SID / TWILIO_AUTH_TOKEN not set", + "set them, or use Telnyx/Plivo instead", + ) + ) + return sec + sec.checks.append(Check(OK, "Credentials present")) + + if not network: + sec.checks.append(Check(SKIP, "Credentials valid", "skipped (--no-network)")) + return sec + + try: + import httpx + except ImportError: # pragma: no cover + sec.checks.append(Check(SKIP, "Credentials valid", "httpx not installed")) + return sec + + base = f"https://api.twilio.com/2010-04-01/Accounts/{sid}" + try: + resp = httpx.get(f"{base}.json", auth=(sid, token), timeout=6.0) + except Exception as exc: # noqa: BLE001 + sec.checks.append(Check(FAIL, "Twilio API unreachable", str(exc))) + return sec + if resp.status_code == 200: + sec.checks.append(Check(OK, "Credentials valid")) + else: + sec.checks.append( + Check( + FAIL, + "Credentials rejected", + f"HTTP {resp.status_code}", + "check TWILIO_ACCOUNT_SID / TWILIO_AUTH_TOKEN", + ) + ) + return sec + + number = os.environ.get("PATTER_PHONE_NUMBER") or os.environ.get( + "TWILIO_PHONE_NUMBER", "" + ) + if not number: + sec.checks.append( + Check(SKIP, "Webhook configured", "set PATTER_PHONE_NUMBER to check") + ) + return sec + try: + resp = httpx.get( + f"{base}/IncomingPhoneNumbers.json", + params={"PhoneNumber": number}, + auth=(sid, token), + timeout=6.0, + ) + rows = resp.json().get("incoming_phone_numbers", []) if resp.status_code == 200 else [] + except Exception as exc: # noqa: BLE001 + sec.checks.append(Check(WARN, "Webhook check failed", str(exc))) + return sec + if not rows: + sec.checks.append( + Check( + WARN, + "Number not on account", + f"{number} not found", + "buy/port the number in Twilio, or fix PATTER_PHONE_NUMBER", + ) + ) + return sec + voice_url = rows[0].get("voice_url", "") + if voice_url: + sec.checks.append(Check(OK, "Webhook configured", voice_url)) + else: + sec.checks.append( + Check( + WARN, + "Webhook not configured", + f"{number} has no voice webhook", + f'patter hermes attach-number {number} --url https:///calls/inbound', + ) + ) + return sec + + +# ────────────────────────────────────────────────────────────────────────── +# Rendering +# ────────────────────────────────────────────────────────────────────────── +def _print_sections(sections: list[Section]) -> None: + for sec in sections: + print(f"\n{sec.title}") + for c in sec.checks: + sym = _color(_SYMBOL.get(c.status, "?"), c.status) + line = f" {sym} {c.label}" + if c.detail: + line += f": {c.detail}" + print(line) + if c.fix and c.status in (WARN, FAIL): + print(f" fix: {c.fix}") + + +def _sections_to_dict(sections: list[Section]) -> dict: + return { + "sections": [ + { + "title": s.title, + "checks": [ + { + "status": c.status, + "label": c.label, + "detail": c.detail, + "fix": c.fix, + } + for c in s.checks + ], + } + for s in sections + ], + "failures": sum( + 1 for s in sections for c in s.checks if c.status == FAIL + ), + "warnings": sum( + 1 for s in sections for c in s.checks if c.status == WARN + ), + } + + +def _run_doctor(args: argparse.Namespace) -> list[Section]: + base_url = _hermes_base_url(getattr(args, "base_url", None)) + network = not getattr(args, "no_network", False) + return [ + _check_hermes(base_url, network=network), + _check_patter(), + _check_twilio(network=network), + ] + + +# ────────────────────────────────────────────────────────────────────────── +# Subcommands +# ────────────────────────────────────────────────────────────────────────── +def cmd_doctor(args: argparse.Namespace) -> int: + sections = _run_doctor(args) + report = _sections_to_dict(sections) + if getattr(args, "json", False): + print(json.dumps(report, indent=2)) + else: + _print_sections(sections) + print() + if report["failures"]: + print( + f"{report['failures']} problem(s) to fix, " + f"{report['warnings']} warning(s)." + ) + elif report["warnings"]: + print(f"Ready, with {report['warnings']} warning(s).") + else: + print("All checks passed. You're ready to take calls.") + return 1 if report["failures"] else 0 + + +def cmd_setup(args: argparse.Namespace) -> int: + from getpatter import _hermes_scaffold + + target = Path(args.dir).resolve() + interactive = sys.stdin.isatty() and not args.yes + + print(f"Patter + Hermes setup\n project: {target}\n") + + # 1. Preflight. + print("Checking your environment…") + sections = _run_doctor(args) + _print_sections(sections) + failures = sum(1 for s in sections for c in s.checks if c.status == FAIL) + print() + + # 2. Scaffold the project. + if interactive and not _confirm(f"Scaffold the project into {target}?"): + print("Skipped scaffolding.") + else: + written = _hermes_scaffold.scaffold(target, force=args.force) + if written: + print(f"Wrote {len(written)} file(s):") + for p in written: + print(f" + {p.relative_to(target)}") + else: + print("Project already exists (use --force to overwrite).") + env_path = target / ".env" + if not env_path.exists(): + (env_path).write_text( + (target / ".env.example").read_text(encoding="utf-8"), + encoding="utf-8", + ) + print(" + .env (from .env.example — fill in your keys)") + print() + + # 3. Optionally attach a Twilio number. + if args.number and args.url: + print(f"Attaching {args.number} → {args.url}") + rc = _attach_number(args.number, args.url, args.status_callback) + if rc != 0: + return rc + elif args.number or args.url: + print( + "Note: pass both --number and --url to auto-configure the Twilio " + "webhook, or run `patter hermes attach-number` later." + ) + + # 4. Next steps. + print("\nNext steps:") + print(f" cd {target}") + print(" # edit .env with your keys") + print(" python scripts/test_text_turn.py # smoke-test the Hermes brain") + print(" python app.py # answer the phone") + if not (args.number and args.url): + print( + " patter hermes attach-number --url /calls/inbound" + ) + return 1 if failures else 0 + + +def cmd_attach_number(args: argparse.Namespace) -> int: + return _attach_number(args.number, args.url, args.status_callback) + + +def cmd_numbers(args: argparse.Namespace) -> int: + sid, token = _twilio_creds() + if not sid or not token: + print( + "Twilio credentials not found. Set TWILIO_ACCOUNT_SID and " + "TWILIO_AUTH_TOKEN.", + file=sys.stderr, + ) + return 2 + try: + import httpx + + resp = httpx.get( + f"https://api.twilio.com/2010-04-01/Accounts/{sid}/IncomingPhoneNumbers.json", + auth=(sid, token), + params={"PageSize": 50}, + timeout=10.0, + ) + except Exception as exc: # noqa: BLE001 + print(f"Twilio API error: {exc}", file=sys.stderr) + return 1 + if resp.status_code != 200: + print(f"Twilio returned HTTP {resp.status_code}", file=sys.stderr) + return 1 + rows = resp.json().get("incoming_phone_numbers", []) + if not rows: + print("No phone numbers on this account.") + return 0 + print(f"{len(rows)} number(s):") + for r in rows: + num = r.get("phone_number", "?") + url = r.get("voice_url", "") or "(no voice webhook)" + print(f" {num} → {url}") + return 0 + + +# ────────────────────────────────────────────────────────────────────────── +# Twilio helpers +# ────────────────────────────────────────────────────────────────────────── +def _twilio_creds() -> tuple[str, str]: + return ( + os.environ.get("TWILIO_ACCOUNT_SID", ""), + os.environ.get("TWILIO_AUTH_TOKEN", ""), + ) + + +def _attach_number(number: str, url: str, status_callback: str | None) -> int: + """Set a Twilio number's voice webhook. Returns a process exit code.""" + sid, token = _twilio_creds() + if not sid or not token: + print( + "Twilio credentials not found. Set TWILIO_ACCOUNT_SID and " + "TWILIO_AUTH_TOKEN.", + file=sys.stderr, + ) + return 2 + if not url.lower().startswith("https://"): + print(f"Webhook URL must be https:// (got {url!r})", file=sys.stderr) + return 2 + try: + import httpx + except ImportError: # pragma: no cover + print("httpx is required for attach-number.", file=sys.stderr) + return 1 + + base = f"https://api.twilio.com/2010-04-01/Accounts/{sid}" + # Resolve the number's SID. + try: + lookup = httpx.get( + f"{base}/IncomingPhoneNumbers.json", + params={"PhoneNumber": number}, + auth=(sid, token), + timeout=10.0, + ) + except Exception as exc: # noqa: BLE001 + print(f"Twilio API error: {exc}", file=sys.stderr) + return 1 + if lookup.status_code != 200: + print(f"Twilio returned HTTP {lookup.status_code}", file=sys.stderr) + return 1 + rows = lookup.json().get("incoming_phone_numbers", []) + if not rows: + print( + f"{number} is not on this account. Run `patter hermes numbers` to " + "list available numbers.", + file=sys.stderr, + ) + return 1 + number_sid = rows[0].get("sid") + + data = {"VoiceUrl": url, "VoiceMethod": "POST"} + if status_callback: + data["StatusCallback"] = status_callback + data["StatusCallbackMethod"] = "POST" + try: + upd = httpx.post( + f"{base}/IncomingPhoneNumbers/{number_sid}.json", + data=data, + auth=(sid, token), + timeout=10.0, + ) + except Exception as exc: # noqa: BLE001 + print(f"Twilio API error: {exc}", file=sys.stderr) + return 1 + if upd.status_code in (200, 201): + print(f"✓ {number} voice webhook → {url}") + if status_callback: + print(f"✓ status callback → {status_callback}") + return 0 + print( + f"Failed to update webhook: HTTP {upd.status_code} {upd.text[:200]}", + file=sys.stderr, + ) + return 1 + + +def _confirm(prompt: str) -> bool: + try: + return input(f"{prompt} [Y/n] ").strip().lower() in ("", "y", "yes") + except (EOFError, KeyboardInterrupt): + return False + + +# ────────────────────────────────────────────────────────────────────────── +# Parser wiring +# ────────────────────────────────────────────────────────────────────────── +def build_hermes_parser(subparsers: argparse._SubParsersAction) -> argparse.ArgumentParser: + """Attach the ``hermes`` subcommand tree to the parent CLI.""" + hermes = subparsers.add_parser( + "hermes", + help="Set up, diagnose, and wire the Hermes voice shell", + ) + hsub = hermes.add_subparsers(dest="hermes_command") + + doctor = hsub.add_parser("doctor", help="Preflight checks for the Hermes voice shell") + doctor.add_argument("--base-url", default=None, help="Hermes gateway base URL") + doctor.add_argument("--no-network", action="store_true", help="Skip live probes") + doctor.add_argument("--json", action="store_true", help="Machine-readable output") + + setup = hsub.add_parser("setup", help="Scaffold a hermes-phone-agent project") + setup.add_argument("--dir", default="hermes-phone-agent", help="Target directory") + setup.add_argument("--force", action="store_true", help="Overwrite existing files") + setup.add_argument("--yes", action="store_true", help="Non-interactive (assume yes)") + setup.add_argument("--number", default=None, help="Twilio number to attach") + setup.add_argument("--url", default=None, help="Public webhook URL to attach") + setup.add_argument("--status-callback", default=None, help="Twilio status callback URL") + setup.add_argument("--base-url", default=None, help="Hermes gateway base URL") + setup.add_argument("--no-network", action="store_true", help="Skip live probes") + + attach = hsub.add_parser("attach-number", help="Point a Twilio number at your Patter URL") + attach.add_argument("number", help="Phone number in E.164 (e.g. +15551234567)") + attach.add_argument("--url", required=True, help="Public voice webhook URL (https)") + attach.add_argument("--status-callback", default=None, help="Twilio status callback URL") + + hsub.add_parser("numbers", help="List the Twilio numbers on your account") + return hermes + + +def dispatch_hermes(args: argparse.Namespace) -> int: + """Entry for ``patter hermes ...``. Returns a process exit code.""" + command = getattr(args, "hermes_command", None) + if command == "doctor": + return cmd_doctor(args) + if command == "setup": + return cmd_setup(args) + if command == "attach-number": + return cmd_attach_number(args) + if command == "numbers": + return cmd_numbers(args) + print( + "Usage: patter hermes {doctor|setup|attach-number|numbers}\n" + "Try: patter hermes doctor", + file=sys.stderr, + ) + return 2 diff --git a/libraries/python/tests/unit/test_hermes_cli.py b/libraries/python/tests/unit/test_hermes_cli.py new file mode 100644 index 0000000..68298f5 --- /dev/null +++ b/libraries/python/tests/unit/test_hermes_cli.py @@ -0,0 +1,184 @@ +"""Unit tests for the ``patter hermes ...`` CLI (doctor / setup / attach-number). + +Live probes are exercised by monkeypatching ``httpx`` at the boundary, so no +real Hermes gateway or Twilio account is touched. +""" + +from __future__ import annotations + +import argparse +from pathlib import Path + +import pytest + +from getpatter import _hermes_scaffold, cli_hermes + + +# ── scaffold ─────────────────────────────────────────────────────────────── +def test_scaffold_writes_all_files(tmp_path: Path) -> None: + written = _hermes_scaffold.scaffold(tmp_path) + rel = {p.relative_to(tmp_path).as_posix() for p in written} + assert rel == set(_hermes_scaffold.FILES) + for name in _hermes_scaffold.FILES: + assert (tmp_path / name).exists() + + +def test_scaffold_skips_existing_without_force(tmp_path: Path) -> None: + (tmp_path / "app.py").write_text("# mine", encoding="utf-8") + written = _hermes_scaffold.scaffold(tmp_path) + assert tmp_path / "app.py" not in written + assert (tmp_path / "app.py").read_text(encoding="utf-8") == "# mine" + # force overwrites. + written2 = _hermes_scaffold.scaffold(tmp_path, force=True) + assert tmp_path / "app.py" in written2 + assert (tmp_path / "app.py").read_text(encoding="utf-8") != "# mine" + + +def test_committed_example_matches_scaffold() -> None: + """The committed examples/ tree must stay in sync with the scaffold map.""" + root = Path(__file__).resolve().parents[4] / "examples" / "hermes-phone-agent" + assert root.is_dir(), f"missing example dir: {root}" + for rel, content in _hermes_scaffold.FILES.items(): + on_disk = (root / rel).read_text(encoding="utf-8") + assert on_disk == content, f"{rel} drifted from the scaffold" + + +# ── doctor ───────────────────────────────────────────────────────────────── +def _doctor_args(**over) -> argparse.Namespace: + base = {"base_url": None, "no_network": True, "json": True} + base.update(over) + return argparse.Namespace(**base) + + +def test_doctor_no_network_skips_probes(capsys, monkeypatch) -> None: + for var in ("API_SERVER_KEY", "DEEPGRAM_API_KEY", "ELEVENLABS_API_KEY"): + monkeypatch.delenv(var, raising=False) + rc = cli_hermes.cmd_doctor(_doctor_args()) + out = capsys.readouterr().out + assert '"skipped (--no-network)"' in out + # warnings don't fail the run. + assert rc == 0 + + +def test_doctor_gateway_unreachable_is_failure(monkeypatch) -> None: + # Force the gateway probe to look like a refused connection. + def boom(*_a, **_k): + raise OSError("Connection refused") + + monkeypatch.setattr("httpx.get", boom) + sections = cli_hermes._check_hermes("http://127.0.0.1:8642/v1", network=True) + statuses = {c.label: c.status for c in sections.checks} + assert statuses["Gateway unreachable"] == cli_hermes.FAIL + + +def test_doctor_gateway_ok_and_model_present(monkeypatch) -> None: + class Resp: + status_code = 200 + + @staticmethod + def json(): + return {"data": [{"id": "hermes-agent"}]} + + monkeypatch.setenv("API_SERVER_KEY", "k") + monkeypatch.setattr("httpx.get", lambda *a, **k: Resp()) + sec = cli_hermes._check_hermes("http://127.0.0.1:8642/v1", network=True) + labels = {c.label: c.status for c in sec.checks} + assert labels["Gateway reachable"] == cli_hermes.OK + assert labels["Model available"] == cli_hermes.OK + + +def test_doctor_exit_code_one_when_failures(monkeypatch) -> None: + monkeypatch.setattr( + cli_hermes, + "_run_doctor", + lambda _a: [cli_hermes.Section("X", [cli_hermes.Check(cli_hermes.FAIL, "bad")])], + ) + assert cli_hermes.cmd_doctor(_doctor_args(json=False)) == 1 + + +# ── attach-number ────────────────────────────────────────────────────────── +def test_attach_number_requires_https(monkeypatch, capsys) -> None: + monkeypatch.setenv("TWILIO_ACCOUNT_SID", "AC123") + monkeypatch.setenv("TWILIO_AUTH_TOKEN", "tok") + rc = cli_hermes._attach_number("+15551234567", "http://x/y", None) + assert rc == 2 + assert "https" in capsys.readouterr().err + + +def test_attach_number_missing_creds(monkeypatch, capsys) -> None: + monkeypatch.delenv("TWILIO_ACCOUNT_SID", raising=False) + monkeypatch.delenv("TWILIO_AUTH_TOKEN", raising=False) + rc = cli_hermes._attach_number("+15551234567", "https://x/y", None) + assert rc == 2 + assert "credentials not found" in capsys.readouterr().err.lower() + + +def test_attach_number_posts_voice_url(monkeypatch, capsys) -> None: + monkeypatch.setenv("TWILIO_ACCOUNT_SID", "AC123") + monkeypatch.setenv("TWILIO_AUTH_TOKEN", "tok") + posted: dict = {} + + class Lookup: + status_code = 200 + + @staticmethod + def json(): + return {"incoming_phone_numbers": [{"sid": "PN1"}]} + + class Update: + status_code = 200 + text = "" + + def fake_get(url, **kw): + assert "IncomingPhoneNumbers.json" in url + return Lookup() + + def fake_post(url, **kw): + posted["url"] = url + posted["data"] = kw.get("data") + return Update() + + monkeypatch.setattr("httpx.get", fake_get) + monkeypatch.setattr("httpx.post", fake_post) + rc = cli_hermes._attach_number( + "+15551234567", "https://abc.example.com/calls/inbound", None + ) + assert rc == 0 + assert "PN1.json" in posted["url"] + assert posted["data"]["VoiceUrl"] == "https://abc.example.com/calls/inbound" + assert posted["data"]["VoiceMethod"] == "POST" + assert "voice webhook" in capsys.readouterr().out + + +def test_attach_number_unknown_number(monkeypatch, capsys) -> None: + monkeypatch.setenv("TWILIO_ACCOUNT_SID", "AC123") + monkeypatch.setenv("TWILIO_AUTH_TOKEN", "tok") + + class Lookup: + status_code = 200 + + @staticmethod + def json(): + return {"incoming_phone_numbers": []} + + monkeypatch.setattr("httpx.get", lambda *a, **k: Lookup()) + rc = cli_hermes._attach_number("+15550000000", "https://x/y", None) + assert rc == 1 + assert "not on this account" in capsys.readouterr().err + + +# ── dispatch ─────────────────────────────────────────────────────────────── +def test_dispatch_unknown_subcommand_returns_usage() -> None: + args = argparse.Namespace(hermes_command=None) + assert cli_hermes.dispatch_hermes(args) == 2 + + +def test_parser_wires_subcommands() -> None: + parser = argparse.ArgumentParser() + sub = parser.add_subparsers(dest="command") + cli_hermes.build_hermes_parser(sub) + ns = parser.parse_args(["hermes", "attach-number", "+15551234567", "--url", "https://x/y"]) + assert ns.command == "hermes" + assert ns.hermes_command == "attach-number" + assert ns.number == "+15551234567" + assert ns.url == "https://x/y" diff --git a/libraries/typescript/src/cli.ts b/libraries/typescript/src/cli.ts index fd83779..0d8fd77 100644 --- a/libraries/typescript/src/cli.ts +++ b/libraries/typescript/src/cli.ts @@ -43,15 +43,33 @@ function printEvalStub(): void { ); } +function printHermesStub(): void { + console.log( + 'The Hermes setup wizard (doctor / setup / attach-number) lives in the\n' + + 'Python CLI today. Use it from the Python SDK:\n\n' + + ' pip install getpatter\n' + + ' patter hermes doctor\n' + + ' patter hermes setup\n\n' + + 'The HermesLLM provider itself is fully available in this TypeScript SDK\n' + + "(import { HermesLLM } from 'getpatter'). See\n" + + 'https://docs.getpatter.com/integrations/hermes for docs.', + ); +} + async function main(): Promise { const command = process.argv[2]; if (command === 'eval') { printEvalStub(); process.exit(0); } + if (command === 'hermes') { + printHermesStub(); + process.exit(0); + } if (command !== 'dashboard') { console.log('Usage: getpatter dashboard [--port 8000]'); console.log(' getpatter eval (stub — use Python SDK for evals)'); + console.log(' getpatter hermes (stub — use Python SDK for the wizard)'); process.exit(command ? 1 : 0); } From 2b40d1f876c11e357845ee58e2190525fd51f75e Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 9 Jun 2026 08:35:34 +0000 Subject: [PATCH 09/11] =?UTF-8?q?feat(hermes):=20doctor=20&=20setup=20read?= =?UTF-8?q?=20real=20config=20=E2=80=94=20.env=20autoload,=20~/.hermes=20d?= =?UTF-8?q?etection,=20key-gen,=20--enable-hermes?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Address the review gaps on the Hermes wizard: it now reads and (opt-in) writes real config instead of only consulting os.environ. doctor: - Autoloads dotenv files before checking — ~/.hermes/.env then the project/cwd .env (non-overriding), with --env-file/--no-env-file to control it. Loaded paths are reported; secrets are never echoed. - Reads ~/.hermes/.env + config.yaml directly: reports API_SERVER_ENABLED, surfaces the configured key/port/model, and runs `hermes gateway status` when the CLI is present. - Sharper severity: CLI missing AND gateway unreachable is now a failure, not a soft warning; gateway-down fix adapts to whether the CLI is available. setup: - --enable-hermes writes API_SERVER_ENABLED=true (and generates an API_SERVER_KEY if absent) into ~/.hermes/.env, backing up to .env.bak first, then reminds the operator to restart the gateway. - --generate-key writes a strong key into the project .env; when used with --enable-hermes the SAME key is mirrored so Patter and Hermes agree (a mismatch is a 401 at call time). - Autoloads env for the preflight so checks reflect the project's .env. New helpers (_parse_env_file / _upsert_env_file / _load_env_files / _read_hermes_config / _enable_hermes_gateway / _generate_key), no new deps. +11 unit tests; docs updated. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts --- docs/integrations/hermes.mdx | 25 +- libraries/python/getpatter/cli_hermes.py | 339 +++++++++++++++++- .../python/tests/unit/test_hermes_cli.py | 122 ++++++- 3 files changed, 466 insertions(+), 20 deletions(-) diff --git a/docs/integrations/hermes.mdx b/docs/integrations/hermes.mdx index 160c48c..ce7dccf 100644 --- a/docs/integrations/hermes.mdx +++ b/docs/integrations/hermes.mdx @@ -176,13 +176,26 @@ patter hermes doctor # preflight: gateway, providers, carrier — with fi patter hermes setup # scaffold ./hermes-phone-agent (app.py, .env, scripts) ``` -`patter hermes doctor` probes the Hermes gateway (`/v1/models`), confirms `HermesLLM` is -constructible, checks your Deepgram / ElevenLabs / Twilio credentials, and prints a -suggested fix for anything missing (`--no-network` skips live probes, `--json` for -machine-readable output). `patter hermes setup` writes the same starter project shown in +`patter hermes doctor` reads your Hermes config directly — it autoloads `~/.hermes/.env` +and the nearest project `.env`, reports whether `API_SERVER_ENABLED` is set and which +gateway port is configured, runs `hermes gateway status` when the CLI is present, then +probes the gateway (`/v1/models`), confirms `HermesLLM` is constructible, and checks your +Deepgram / ElevenLabs / Twilio credentials — printing a suggested fix for anything missing +(`--no-network` skips live probes, `--json` for machine-readable output, `--env-file` / +`--no-env-file` to control autoloading). + +`patter hermes setup` writes the same starter project shown in [`examples/hermes-phone-agent`](https://github.com/PatterAI/Patter/tree/main/examples/hermes-phone-agent) -and, given `--number` and `--url`, attaches the Twilio webhook for you. To wire an existing -number on its own: +and can also wire the two ends together for you: + +- `--enable-hermes` writes `API_SERVER_ENABLED=true` (and generates an `API_SERVER_KEY` if + absent) into `~/.hermes/.env`, backing the file up first — then reminds you to restart the + gateway. The same key is mirrored into the project `.env` so Patter and Hermes agree (a + mismatch is a 401 at call time). +- `--generate-key` puts a strong `API_SERVER_KEY` into the project `.env`. +- `--number` + `--url` attach the Twilio webhook in the same run. + +To wire an existing number on its own: ```bash patter hermes numbers # list the numbers on your Twilio account diff --git a/libraries/python/getpatter/cli_hermes.py b/libraries/python/getpatter/cli_hermes.py index 1c3c603..4e271a1 100644 --- a/libraries/python/getpatter/cli_hermes.py +++ b/libraries/python/getpatter/cli_hermes.py @@ -91,25 +91,225 @@ def _get_json(url: str, *, headers: dict | None = None, timeout: float = 4.0): return resp.status_code, body, "" +# ────────────────────────────────────────────────────────────────────────── +# .env / Hermes config helpers +# ────────────────────────────────────────────────────────────────────────── +def _parse_env_file(path: Path) -> dict[str, str]: + """Parse a ``KEY=VALUE`` dotenv file. Ignores blanks, comments, ``export``. + + Surrounding single/double quotes are stripped. Returns an empty dict if the + file is missing or unreadable. + """ + out: dict[str, str] = {} + try: + text = path.read_text(encoding="utf-8") + except OSError: + return out + for raw in text.splitlines(): + line = raw.strip() + if not line or line.startswith("#") or "=" not in line: + continue + if line.startswith("export "): + line = line[len("export ") :].lstrip() + key, _, val = line.partition("=") + key = key.strip() + val = val.strip() + if len(val) >= 2 and val[0] == val[-1] and val[0] in ("'", '"'): + val = val[1:-1] + if key: + out[key] = val + return out + + +def _hermes_home() -> Path: + """Hermes config dir — ``$HERMES_HOME`` or ``~/.hermes`` (override for tests).""" + override = os.environ.get("HERMES_HOME") + return Path(override) if override else Path.home() / ".hermes" + + +def _read_hermes_config() -> dict[str, str]: + """Read Hermes ``api_server`` settings from ``~/.hermes/.env`` and + ``~/.hermes/config.yaml``. + + Returns a flat dict that may include ``API_SERVER_ENABLED``, + ``API_SERVER_KEY``, ``API_SERVER_HOST``, ``API_SERVER_PORT``, and + ``API_SERVER_MODEL_NAME``. The ``.env`` file wins over ``config.yaml``. + Returns an empty dict when ``~/.hermes`` is absent. + """ + home = _hermes_home() + if not home.exists(): + return {} + cfg: dict[str, str] = {} + yaml_path = home / "config.yaml" + if yaml_path.exists(): + try: + import yaml + + data = yaml.safe_load(yaml_path.read_text(encoding="utf-8")) or {} + api = data.get("api_server") if isinstance(data, dict) else None + if isinstance(api, dict): + _map = { + "enabled": "API_SERVER_ENABLED", + "key": "API_SERVER_KEY", + "host": "API_SERVER_HOST", + "port": "API_SERVER_PORT", + "model_name": "API_SERVER_MODEL_NAME", + } + for src, dst in _map.items(): + if src in api and api[src] is not None: + cfg[dst] = str(api[src]) + except Exception: # noqa: BLE001 - malformed yaml shouldn't crash doctor + pass + cfg.update(_parse_env_file(home / ".env")) # .env overrides config.yaml + return cfg + + +def _env_files_to_load( + explicit: list[str] | None, *, project_dir: Path | None +) -> list[Path]: + """Resolve which dotenv files to autoload, in increasing priority order.""" + if explicit: + return [Path(p) for p in explicit] + chain: list[Path] = [_hermes_home() / ".env"] + if project_dir is not None: + chain.append(project_dir / ".env") + chain.append(Path.cwd() / ".env") + return chain + + +def _load_env_files(paths: list[Path], *, override: bool = False) -> list[Path]: + """Load dotenv files into ``os.environ``. Returns the files actually applied. + + Later paths win over earlier ones. Existing ``os.environ`` values are kept + unless ``override`` is set. + """ + applied: list[Path] = [] + for path in paths: + values = _parse_env_file(path) + if not values: + continue + for key, val in values.items(): + if override or key not in os.environ: + os.environ[key] = val + applied.append(path) + return applied + + +def _upsert_env_file(path: Path, updates: dict[str, str]) -> None: + """Set ``KEY=VALUE`` pairs in a dotenv file, preserving other lines. + + Existing keys are replaced in place; new keys are appended. Creates the file + (and parent dir) if missing. + """ + path.parent.mkdir(parents=True, exist_ok=True) + lines = path.read_text(encoding="utf-8").splitlines() if path.exists() else [] + remaining = dict(updates) + for i, raw in enumerate(lines): + stripped = raw.strip() + if not stripped or stripped.startswith("#") or "=" not in stripped: + continue + key = stripped.split("=", 1)[0].strip() + if key.startswith("export "): + key = key[len("export ") :].strip() + if key in remaining: + lines[i] = f"{key}={remaining.pop(key)}" + for key, val in remaining.items(): + lines.append(f"{key}={val}") + path.write_text("\n".join(lines) + "\n", encoding="utf-8") + + +def _generate_key() -> str: + """A strong, URL-safe API key.""" + import secrets + + return secrets.token_urlsafe(32) + + +def _hermes_gateway_status() -> Check | None: + """Best-effort ``hermes gateway status`` probe. ``None`` if CLI absent.""" + if not shutil.which("hermes"): + return None + import subprocess + + try: + proc = subprocess.run( + ["hermes", "gateway", "status"], + capture_output=True, + text=True, + timeout=8, + ) + except Exception: # noqa: BLE001 - missing subcommand / timeout + return None + if proc.returncode == 0: + first = (proc.stdout or proc.stderr or "").strip().splitlines() + return Check(OK, "Gateway service", first[0] if first else "status ok") + return Check( + WARN, + "Gateway service", + "hermes gateway status reported a problem", + "hermes gateway start", + ) + + # ────────────────────────────────────────────────────────────────────────── # Check groups # ────────────────────────────────────────────────────────────────────────── def _check_hermes(base_url: str, *, network: bool) -> Section: sec = Section("Hermes") + have_cli = bool(shutil.which("hermes")) + hermes_cfg = _read_hermes_config() + home = _hermes_home() - if shutil.which("hermes"): + # CLI presence (severity finalised at the end once we know gateway state). + if have_cli: sec.checks.append(Check(OK, "CLI found", "hermes on PATH")) else: sec.checks.append( Check( WARN, "CLI not found", - "optional — only the gateway is required", + "optional when the gateway is already running", "install Hermes: https://github.com/NousResearch/hermes-agent", ) ) - key = os.environ.get("API_SERVER_KEY", "") + # Hermes-side config (~/.hermes/.env + config.yaml), read directly. + if hermes_cfg: + enabled = hermes_cfg.get("API_SERVER_ENABLED", "").lower() + if enabled in ("true", "1", "yes"): + sec.checks.append( + Check(OK, "API server enabled", f"API_SERVER_ENABLED=true in {home}") + ) + elif enabled: + sec.checks.append( + Check( + WARN, + "API server disabled", + f"API_SERVER_ENABLED={enabled!r} in {home}", + "patter hermes setup --enable-hermes", + ) + ) + else: + sec.checks.append( + Check( + WARN, + "API server flag absent", + f"API_SERVER_ENABLED not set in {home}", + "patter hermes setup --enable-hermes", + ) + ) + else: + sec.checks.append( + Check(SKIP, "Hermes config", f"no {home} directory found") + ) + + # Gateway service status via the CLI (best-effort). + gw_status = _hermes_gateway_status() + if gw_status is not None: + sec.checks.append(gw_status) + + # A key may live in the process env or in ~/.hermes — accept either. + key = os.environ.get("API_SERVER_KEY", "") or hermes_cfg.get("API_SERVER_KEY", "") if key: sec.checks.append(Check(OK, "API_SERVER_KEY set")) else: @@ -118,7 +318,7 @@ def _check_hermes(base_url: str, *, network: bool) -> Section: WARN, "API_SERVER_KEY not set", "keyless local gateways work, but a key is recommended", - 'export API_SERVER_KEY="choose-a-strong-key"', + "patter hermes setup --generate-key", ) ) @@ -130,7 +330,9 @@ def _check_hermes(base_url: str, *, network: bool) -> Section: status, body, err = _get_json(f"{base_url}/models", headers=headers) if status == 200: sec.checks.append(Check(OK, "Gateway reachable", base_url)) - want = os.environ.get("API_SERVER_MODEL_NAME", "hermes-agent") + want = os.environ.get("API_SERVER_MODEL_NAME") or hermes_cfg.get( + "API_SERVER_MODEL_NAME", "hermes-agent" + ) ids = _model_ids(body) if want in ids: sec.checks.append(Check(OK, "Model available", want)) @@ -140,7 +342,7 @@ def _check_hermes(base_url: str, *, network: bool) -> Section: WARN, "Model not found", f"{want!r} missing; saw {', '.join(sorted(ids)[:5])}", - f'set API_SERVER_MODEL_NAME to one of the served models', + "set API_SERVER_MODEL_NAME to one of the served models", ) ) else: @@ -158,15 +360,26 @@ def _check_hermes(base_url: str, *, network: bool) -> Section: ) else: detail = f"HTTP {status}" if status else (err or "no response") + # Gateway down with no CLI to start it is a hard stop — promote the + # CLI-not-found note to a failure so the verdict isn't a soft warning. + fix = ( + "hermes gateway start" + if have_cli + else "install Hermes + start the gateway (API_SERVER_ENABLED=true)" + ) sec.checks.append( Check( FAIL, "Gateway unreachable", f"{base_url} — {detail}", - "enable + start the gateway (API_SERVER_ENABLED=true), " - "or pass --base-url", + fix + ", or pass --base-url", ) ) + if not have_cli: + for c in sec.checks: + if c.label == "CLI not found": + c.status = FAIL + c.detail = "and the gateway is unreachable" return sec @@ -388,12 +601,24 @@ def _run_doctor(args: argparse.Namespace) -> list[Section]: # ────────────────────────────────────────────────────────────────────────── # Subcommands # ────────────────────────────────────────────────────────────────────────── +def _apply_env(args: argparse.Namespace, *, project_dir: Path | None = None) -> list[Path]: + """Autoload dotenv files for a command unless ``--no-env-file`` was passed.""" + if getattr(args, "no_env_file", False): + return [] + paths = _env_files_to_load(getattr(args, "env_file", None), project_dir=project_dir) + return _load_env_files(paths) + + def cmd_doctor(args: argparse.Namespace) -> int: + loaded = _apply_env(args) sections = _run_doctor(args) report = _sections_to_dict(sections) if getattr(args, "json", False): + report["loaded_env_files"] = [str(p) for p in loaded] print(json.dumps(report, indent=2)) else: + if loaded: + print("Loaded env from: " + ", ".join(str(p) for p in loaded)) _print_sections(sections) print() if report["failures"]: @@ -416,14 +641,26 @@ def cmd_setup(args: argparse.Namespace) -> int: print(f"Patter + Hermes setup\n project: {target}\n") - # 1. Preflight. + # 0. Optionally enable the Hermes API server in ~/.hermes/.env. This is the + # one step that writes to your Hermes install — explicit opt-in, backed up. + gateway_key = "" + if getattr(args, "enable_hermes", False): + gateway_key = _enable_hermes_gateway() + print() + + # 1. Load any existing env (project .env, ~/.hermes/.env) for the preflight. + loaded = _apply_env(args, project_dir=target) + if loaded: + print("Loaded env from: " + ", ".join(str(p) for p in loaded)) + + # 2. Preflight. print("Checking your environment…") sections = _run_doctor(args) _print_sections(sections) failures = sum(1 for s in sections for c in s.checks if c.status == FAIL) print() - # 2. Scaffold the project. + # 3. Scaffold the project. if interactive and not _confirm(f"Scaffold the project into {target}?"): print("Skipped scaffolding.") else: @@ -441,9 +678,22 @@ def cmd_setup(args: argparse.Namespace) -> int: encoding="utf-8", ) print(" + .env (from .env.example — fill in your keys)") + # Put an API_SERVER_KEY into the project .env. When we just enabled the + # gateway, reuse ITS key so Patter and Hermes agree (a mismatch is a 401 + # at call time); otherwise generate a fresh one on --generate-key. + if gateway_key: + _upsert_env_file(env_path, {"API_SERVER_KEY": gateway_key}) + print(" + API_SERVER_KEY in .env (matches the gateway key)") + elif getattr(args, "generate_key", False): + existing = _parse_env_file(env_path).get("API_SERVER_KEY", "") + if existing and existing != "choose-a-strong-key" and not args.force: + print(" · API_SERVER_KEY already set (use --force to regenerate)") + else: + _upsert_env_file(env_path, {"API_SERVER_KEY": _generate_key()}) + print(" + API_SERVER_KEY generated in .env") print() - # 3. Optionally attach a Twilio number. + # 4. Optionally attach a Twilio number. if args.number and args.url: print(f"Attaching {args.number} → {args.url}") rc = _attach_number(args.number, args.url, args.status_callback) @@ -455,7 +705,7 @@ def cmd_setup(args: argparse.Namespace) -> int: "webhook, or run `patter hermes attach-number` later." ) - # 4. Next steps. + # 5. Next steps. print("\nNext steps:") print(f" cd {target}") print(" # edit .env with your keys") @@ -508,6 +758,46 @@ def cmd_numbers(args: argparse.Namespace) -> int: return 0 +# ────────────────────────────────────────────────────────────────────────── +# Hermes gateway enablement (the one step that writes to ~/.hermes) +# ────────────────────────────────────────────────────────────────────────── +def _enable_hermes_gateway() -> str: + """Write ``API_SERVER_ENABLED=true`` (+ a key if absent) to ``~/.hermes/.env``. + + Backs the file up to ``.env.bak`` first. Prints what it changed and reminds + the operator to (re)start the gateway — Patter does not manage the service. + Returns the gateway's ``API_SERVER_KEY`` (existing or freshly generated) so + the caller can keep the project ``.env`` in sync with it. + """ + home = _hermes_home() + env_path = home / ".env" + existing = _parse_env_file(env_path) + + if env_path.exists(): + backup = env_path.parent / (env_path.name + ".bak") + backup.write_text(env_path.read_text(encoding="utf-8"), encoding="utf-8") + print(f"Backed up {env_path} → {backup}") + + updates: dict[str, str] = {"API_SERVER_ENABLED": "true"} + key = existing.get("API_SERVER_KEY", "") + if not key: + key = _generate_key() + updates["API_SERVER_KEY"] = key + _upsert_env_file(env_path, updates) + + print(f"✓ API_SERVER_ENABLED=true written to {env_path}") + if "API_SERVER_KEY" in updates: + print("✓ API_SERVER_KEY generated for the gateway") + if shutil.which("hermes"): + print("Now (re)start the gateway: hermes gateway start") + else: + print( + "Hermes CLI not found — start the gateway your usual way so the new " + "settings take effect." + ) + return key + + # ────────────────────────────────────────────────────────────────────────── # Twilio helpers # ────────────────────────────────────────────────────────────────────────── @@ -610,6 +900,15 @@ def build_hermes_parser(subparsers: argparse._SubParsersAction) -> argparse.Argu doctor.add_argument("--base-url", default=None, help="Hermes gateway base URL") doctor.add_argument("--no-network", action="store_true", help="Skip live probes") doctor.add_argument("--json", action="store_true", help="Machine-readable output") + doctor.add_argument( + "--env-file", + action="append", + default=None, + help="dotenv file(s) to load (repeatable; default: ~/.hermes/.env + ./.env)", + ) + doctor.add_argument( + "--no-env-file", action="store_true", help="Do not autoload any .env file" + ) setup = hsub.add_parser("setup", help="Scaffold a hermes-phone-agent project") setup.add_argument("--dir", default="hermes-phone-agent", help="Target directory") @@ -620,6 +919,22 @@ def build_hermes_parser(subparsers: argparse._SubParsersAction) -> argparse.Argu setup.add_argument("--status-callback", default=None, help="Twilio status callback URL") setup.add_argument("--base-url", default=None, help="Hermes gateway base URL") setup.add_argument("--no-network", action="store_true", help="Skip live probes") + setup.add_argument( + "--generate-key", + action="store_true", + help="Generate a strong API_SERVER_KEY into the project .env", + ) + setup.add_argument( + "--enable-hermes", + action="store_true", + help="Write API_SERVER_ENABLED=true (+ key) to ~/.hermes/.env (backed up)", + ) + setup.add_argument( + "--env-file", action="append", default=None, help="dotenv file(s) to load" + ) + setup.add_argument( + "--no-env-file", action="store_true", help="Do not autoload any .env file" + ) attach = hsub.add_parser("attach-number", help="Point a Twilio number at your Patter URL") attach.add_argument("number", help="Phone number in E.164 (e.g. +15551234567)") diff --git a/libraries/python/tests/unit/test_hermes_cli.py b/libraries/python/tests/unit/test_hermes_cli.py index 68298f5..5877dbd 100644 --- a/libraries/python/tests/unit/test_hermes_cli.py +++ b/libraries/python/tests/unit/test_hermes_cli.py @@ -7,10 +7,9 @@ from __future__ import annotations import argparse +import os from pathlib import Path -import pytest - from getpatter import _hermes_scaffold, cli_hermes @@ -173,6 +172,125 @@ def test_dispatch_unknown_subcommand_returns_usage() -> None: assert cli_hermes.dispatch_hermes(args) == 2 +# ── env / config helpers ──────────────────────────────────────────────────── +def test_parse_env_file_handles_quotes_export_comments(tmp_path: Path) -> None: + p = tmp_path / ".env" + p.write_text( + "# comment\n" + "\n" + "export API_SERVER_KEY='secret'\n" + 'PATTER_LANGUAGE="it"\n' + "BARE=value\n" + "NOEQUALS\n", + encoding="utf-8", + ) + parsed = cli_hermes._parse_env_file(p) + assert parsed == { + "API_SERVER_KEY": "secret", + "PATTER_LANGUAGE": "it", + "BARE": "value", + } + + +def test_parse_env_file_missing_returns_empty(tmp_path: Path) -> None: + assert cli_hermes._parse_env_file(tmp_path / "nope.env") == {} + + +def test_upsert_env_file_replaces_and_appends(tmp_path: Path) -> None: + p = tmp_path / ".env" + p.write_text("# header\nAPI_SERVER_PORT=8642\nKEEP=me\n", encoding="utf-8") + cli_hermes._upsert_env_file(p, {"API_SERVER_PORT": "9000", "NEW": "x"}) + text = p.read_text(encoding="utf-8") + assert "# header" in text # comments preserved + assert "KEEP=me" in text + assert "API_SERVER_PORT=9000" in text + assert "API_SERVER_PORT=8642" not in text + assert "NEW=x" in text + + +def test_upsert_env_file_creates_when_missing(tmp_path: Path) -> None: + p = tmp_path / "sub" / ".env" + cli_hermes._upsert_env_file(p, {"A": "1"}) + assert p.read_text(encoding="utf-8").strip() == "A=1" + + +def test_load_env_files_does_not_override_existing(tmp_path: Path, monkeypatch) -> None: + p = tmp_path / ".env" + p.write_text("FOO=fromfile\nBAR=baz\n", encoding="utf-8") + monkeypatch.setenv("FOO", "fromenv") + monkeypatch.delenv("BAR", raising=False) + applied = cli_hermes._load_env_files([p]) + assert applied == [p] + assert os.environ["FOO"] == "fromenv" # not overridden + assert os.environ["BAR"] == "baz" # newly loaded + + +def test_read_hermes_config_env_overrides_yaml(tmp_path: Path, monkeypatch) -> None: + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + (tmp_path / "config.yaml").write_text( + "api_server:\n enabled: true\n port: 8642\n key: fromyaml\n", + encoding="utf-8", + ) + (tmp_path / ".env").write_text("API_SERVER_KEY=fromenv\n", encoding="utf-8") + cfg = cli_hermes._read_hermes_config() + assert cfg["API_SERVER_ENABLED"] == "True" + assert cfg["API_SERVER_PORT"] == "8642" + assert cfg["API_SERVER_KEY"] == "fromenv" # .env wins + + +def test_read_hermes_config_absent_home(tmp_path: Path, monkeypatch) -> None: + monkeypatch.setenv("HERMES_HOME", str(tmp_path / "missing")) + assert cli_hermes._read_hermes_config() == {} + + +def test_enable_hermes_gateway_writes_and_backs_up(tmp_path: Path, monkeypatch) -> None: + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + env_path = tmp_path / ".env" + env_path.write_text("API_SERVER_PORT=8642\n", encoding="utf-8") + key = cli_hermes._enable_hermes_gateway() + assert (tmp_path / ".env.bak").exists() + parsed = cli_hermes._parse_env_file(env_path) + assert parsed["API_SERVER_ENABLED"] == "true" + assert parsed["API_SERVER_KEY"] == key # returned key matches what was written + assert parsed["API_SERVER_PORT"] == "8642" # preserved + + +def test_enable_hermes_gateway_keeps_existing_key(tmp_path: Path, monkeypatch) -> None: + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + (tmp_path / ".env").write_text("API_SERVER_KEY=keepme\n", encoding="utf-8") + key = cli_hermes._enable_hermes_gateway() + assert key == "keepme" + + +# ── severity + autoload integration ────────────────────────────────────────── +def test_cli_missing_and_gateway_down_is_failure(tmp_path: Path, monkeypatch) -> None: + monkeypatch.setenv("HERMES_HOME", str(tmp_path / "missing")) + monkeypatch.setattr(cli_hermes.shutil, "which", lambda _name: None) + monkeypatch.setattr("httpx.get", lambda *a, **k: (_ for _ in ()).throw(OSError("refused"))) + sec = cli_hermes._check_hermes("http://127.0.0.1:8642/v1", network=True) + by_label = {c.label: c.status for c in sec.checks} + assert by_label["CLI not found"] == cli_hermes.FAIL + assert by_label["Gateway unreachable"] == cli_hermes.FAIL + + +def test_doctor_autoloads_env_file(tmp_path: Path, monkeypatch, capsys) -> None: + monkeypatch.delenv("DEEPGRAM_API_KEY", raising=False) + env = tmp_path / ".env" + env.write_text("DEEPGRAM_API_KEY=dg-from-file\n", encoding="utf-8") + args = argparse.Namespace( + base_url=None, + no_network=True, + json=True, + env_file=[str(env)], + no_env_file=False, + ) + cli_hermes.cmd_doctor(args) + assert os.environ.get("DEEPGRAM_API_KEY") == "dg-from-file" + out = capsys.readouterr().out + assert "dg-from-file" not in out # secrets aren't echoed + assert str(env) in out # but the loaded path is reported + + def test_parser_wires_subcommands() -> None: parser = argparse.ArgumentParser() sub = parser.add_subparsers(dest="command") From 86c37d5fd507f173e43b1c116aa742b6f2b3c71e Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 9 Jun 2026 11:06:24 +0000 Subject: [PATCH 10/11] =?UTF-8?q?feat(hermes):=20end-to-end=20setup=20acce?= =?UTF-8?q?ptance=20=E2=80=94=20test,=20gateway=20start+wait,=20trace/diag?= =?UTF-8?q?nose?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Close the acceptance + debugging gaps so a green run means a real call works. setup: - --start-gateway runs `hermes gateway start` then polls /v1/models until the gateway answers, completing the enable → start → verify cycle. New `patter hermes test` — acceptance, not just preflight: GET /v1/models, send a real /v1/chat/completions turn with the X-Hermes-Session-Id header and report the latency + reply snippet, confirm HermesLLM is constructible, and check the STT/TTS keys. Exit non-zero on any blocker. New `patter hermes trace [call]` / `diagnose [call]` — read the on-disk per-call log (PATTER_LOG_DIR; services/call_log.py) and classify the pipeline stage by stage (carrier → STT → Hermes → TTS), with a latency breakdown. `diagnose` applies a decision tree and names the first broken stage with a fix, e.g. "Hermes replied but no audio — TTS stage. Check ELEVENLABS_API_KEY / REST transport." Defaults to the latest call; accepts a call_id or a directory. Note: item #3 (auto-attach the tunnel URL to the carrier) is already handled by the SDK — serve() auto-configures the Twilio/Plivo webhook once the tunnel is up (server.py) — so the scaffold app does it on `python app.py`; documented. Scaffold now sets PATTER_LOG_DIR and documents test/trace/diagnose; example dir regenerated. TS CLI stub lists the new subcommands. +15 unit tests; docs updated. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts --- docs/integrations/hermes.mdx | 19 + examples/hermes-phone-agent/.env.example | 2 + examples/hermes-phone-agent/README.md | 16 + .../python/getpatter/_hermes_scaffold.py | 18 + libraries/python/getpatter/cli_hermes.py | 431 +++++++++++++++++- .../python/tests/unit/test_hermes_cli.py | 180 ++++++++ libraries/typescript/src/cli.ts | 7 +- 7 files changed, 669 insertions(+), 4 deletions(-) diff --git a/docs/integrations/hermes.mdx b/docs/integrations/hermes.mdx index ce7dccf..8c9118b 100644 --- a/docs/integrations/hermes.mdx +++ b/docs/integrations/hermes.mdx @@ -202,6 +202,25 @@ patter hermes numbers # list the numbers on your Twilio account patter hermes attach-number +15551234567 --url https:///calls/inbound ``` +To go from a freshly enabled gateway to a verified one in a single run, add +`--start-gateway` — `setup` then runs `hermes gateway start` and waits for `/v1/models` to +answer before continuing. Before placing a real call, run the end-to-end acceptance check, +which sends an actual chat turn through the gateway (with the Hermes session header) and +confirms your providers are ready: + +```bash +patter hermes test # /v1/models + a real /v1/chat/completions turn + provider keys +``` + +When a call misbehaves, point Patter's per-call log (`PATTER_LOG_DIR`) at the tracer to see +exactly which stage broke — carrier → STT → Hermes → TTS — with a latency breakdown and a +one-line verdict: + +```bash +patter hermes trace # latest call's pipeline stages + stt/llm/tts latency +patter hermes diagnose # e.g. "Hermes replied but no audio — TTS stage" + the fix +``` + These commands live in the Python SDK today; the `HermesLLM` provider itself is available in both the Python and TypeScript SDKs. diff --git a/examples/hermes-phone-agent/.env.example b/examples/hermes-phone-agent/.env.example index de760cf..0b382e3 100644 --- a/examples/hermes-phone-agent/.env.example +++ b/examples/hermes-phone-agent/.env.example @@ -10,6 +10,8 @@ PATTER_PHONE_NUMBER=+15551234567 PATTER_LANGUAGE=en # REST is the safer default for a first PSTN demo; set to ws for streaming. PATTER_ELEVENLABS_TRANSPORT=rest +# Per-call logs — enables `patter hermes trace` / `patter hermes diagnose`. +PATTER_LOG_DIR=./patter-logs # ── Twilio carrier ──────────────────────────────────────────────────── TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx diff --git a/examples/hermes-phone-agent/README.md b/examples/hermes-phone-agent/README.md index 3b5298c..cc7af5c 100644 --- a/examples/hermes-phone-agent/README.md +++ b/examples/hermes-phone-agent/README.md @@ -50,6 +50,22 @@ Now call your number and talk to Hermes. python scripts/test_outbound_call.py +15557654321 ``` +## Debug a call + +With `PATTER_LOG_DIR` set (see `.env`), Patter writes a per-call log. After a +call, inspect what happened stage by stage, or get a one-line verdict: + +```bash +patter hermes trace # latest call: carrier → STT → Hermes → TTS + latency +patter hermes diagnose # "Hermes replied but no audio — TTS stage" + fix +``` + +Before placing a call at all, confirm the brain answers and providers are ready: + +```bash +patter hermes test # /v1/models + a real chat turn + provider keys +``` + ## Why Patter instead of a hosted custom-LLM voice agent? - **Hermes stays private.** A hosted platform has to reach your "brain" endpoint diff --git a/libraries/python/getpatter/_hermes_scaffold.py b/libraries/python/getpatter/_hermes_scaffold.py index fe9b7a5..0c8ba89 100644 --- a/libraries/python/getpatter/_hermes_scaffold.py +++ b/libraries/python/getpatter/_hermes_scaffold.py @@ -94,6 +94,8 @@ PATTER_LANGUAGE=en # REST is the safer default for a first PSTN demo; set to ws for streaming. PATTER_ELEVENLABS_TRANSPORT=rest +# Per-call logs — enables `patter hermes trace` / `patter hermes diagnose`. +PATTER_LOG_DIR=./patter-logs # ── Twilio carrier ──────────────────────────────────────────────────── TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx @@ -159,6 +161,22 @@ python scripts/test_outbound_call.py +15557654321 ``` +## Debug a call + +With `PATTER_LOG_DIR` set (see `.env`), Patter writes a per-call log. After a +call, inspect what happened stage by stage, or get a one-line verdict: + +```bash +patter hermes trace # latest call: carrier → STT → Hermes → TTS + latency +patter hermes diagnose # "Hermes replied but no audio — TTS stage" + fix +``` + +Before placing a call at all, confirm the brain answers and providers are ready: + +```bash +patter hermes test # /v1/models + a real chat turn + provider keys +``` + ## Why Patter instead of a hosted custom-LLM voice agent? - **Hermes stays private.** A hosted platform has to reach your "brain" endpoint diff --git a/libraries/python/getpatter/cli_hermes.py b/libraries/python/getpatter/cli_hermes.py index 4e271a1..7fda0b1 100644 --- a/libraries/python/getpatter/cli_hermes.py +++ b/libraries/python/getpatter/cli_hermes.py @@ -653,6 +653,15 @@ def cmd_setup(args: argparse.Namespace) -> int: if loaded: print("Loaded env from: " + ", ".join(str(p) for p in loaded)) + # 1b. Optionally start the gateway and wait for readiness — completes the + # enable → start → verify cycle so the preflight sees a live gateway. + if getattr(args, "start_gateway", False) and not getattr(args, "no_network", False): + base_url = _hermes_base_url(getattr(args, "base_url", None)) + key = gateway_key or os.environ.get("API_SERVER_KEY", "") + if _start_gateway(): + _wait_for_gateway(base_url, key) + print() + # 2. Preflight. print("Checking your environment…") sections = _run_doctor(args) @@ -718,6 +727,111 @@ def cmd_setup(args: argparse.Namespace) -> int: return 1 if failures else 0 +def _chat_turn_check(base_url: str, key: str, model: str, prompt: str) -> Check: + """Send one ``/chat/completions`` turn with Hermes session headers.""" + import time + + try: + import httpx + except ImportError: # pragma: no cover + return Check(SKIP, "Chat turn", "httpx not installed") + headers = { + "Content-Type": "application/json", + # Mirror HermesLLM: per-call continuity is carried in headers. + "X-Hermes-Session-Id": "patter-cli-test", + } + if key: + headers["Authorization"] = f"Bearer {key}" + payload = { + "model": model, + "messages": [{"role": "user", "content": prompt}], + "stream": False, + } + start = time.monotonic() + try: + resp = httpx.post( + f"{base_url}/chat/completions", json=payload, headers=headers, timeout=120.0 + ) + except Exception as exc: # noqa: BLE001 + return Check(FAIL, "Chat turn", str(exc), "patter hermes doctor") + elapsed = int((time.monotonic() - start) * 1000) + if resp.status_code != 200: + return Check( + FAIL, + "Chat turn", + f"HTTP {resp.status_code}: {resp.text[:160]}", + "check the model name and API_SERVER_KEY", + ) + try: + content = resp.json()["choices"][0]["message"]["content"] + except Exception: # noqa: BLE001 + return Check(FAIL, "Chat turn", "200 but no choices[0].message.content") + snippet = " ".join((content or "").split())[:60] + if not snippet: + return Check(WARN, "Chat turn", f"empty reply ({elapsed} ms)") + return Check(OK, "Chat turn", f'{elapsed} ms — "{snippet}…"') + + +def cmd_test(args: argparse.Namespace) -> int: + """End-to-end acceptance: gateway + a real chat turn + provider readiness.""" + loaded = _apply_env(args) + base_url = _hermes_base_url(getattr(args, "base_url", None)) + hermes_cfg = _read_hermes_config() + key = os.environ.get("API_SERVER_KEY", "") or hermes_cfg.get("API_SERVER_KEY", "") + model = os.environ.get("API_SERVER_MODEL_NAME") or hermes_cfg.get( + "API_SERVER_MODEL_NAME", "hermes-agent" + ) + + sec = Section("Hermes acceptance") + status, body, err = _get_json( + f"{base_url}/models", + headers={"Authorization": f"Bearer {key}"} if key else None, + ) + if status == 200: + sec.checks.append(Check(OK, "Gateway reachable", base_url)) + ids = _model_ids(body) + sec.checks.append( + Check(OK, "Model available", model) + if model in ids + else Check(WARN, "Model not found", f"{model!r} not in {sorted(ids)[:5]}") + ) + sec.checks.append( + _chat_turn_check(base_url, key, model, getattr(args, "prompt", None) or + "Reply with one short spoken sentence to confirm you are online.") + ) + else: + detail = f"HTTP {status}" if status else (err or "no response") + sec.checks.append( + Check(FAIL, "Gateway reachable", f"{base_url} — {detail}", "patter hermes doctor") + ) + + # HermesLLM + provider readiness (so a green `test` means a real call can run). + try: + from getpatter import HermesLLM + + HermesLLM() + sec.checks.append(Check(OK, "HermesLLM constructible")) + except Exception as exc: # noqa: BLE001 + sec.checks.append(Check(FAIL, "HermesLLM construction failed", str(exc))) + sec.checks.append(_env_key("DEEPGRAM_API_KEY", "Deepgram STT")) + sec.checks.append(_env_key("ELEVENLABS_API_KEY", "ElevenLabs TTS")) + + report = _sections_to_dict([sec]) + if getattr(args, "json", False): + report["loaded_env_files"] = [str(p) for p in loaded] + print(json.dumps(report, indent=2)) + else: + if loaded: + print("Loaded env from: " + ", ".join(str(p) for p in loaded)) + _print_sections([sec]) + print() + if report["failures"]: + print(f"{report['failures']} blocker(s) — fix before calling.") + else: + print("Acceptance passed — Hermes is answering and providers are ready.") + return 1 if report["failures"] else 0 + + def cmd_attach_number(args: argparse.Namespace) -> int: return _attach_number(args.number, args.url, args.status_callback) @@ -758,6 +872,238 @@ def cmd_numbers(args: argparse.Namespace) -> int: return 0 +# ────────────────────────────────────────────────────────────────────────── +# Call trace / diagnose (reads the on-disk call log; see services/call_log.py) +# ────────────────────────────────────────────────────────────────────────── +def _call_log_root(override: str | None) -> Path | None: + from getpatter.services.call_log import resolve_log_root + + return resolve_log_root(override) + + +def _read_jsonl(path: Path) -> list[dict]: + rows: list[dict] = [] + try: + text = path.read_text(encoding="utf-8") + except OSError: + return rows + for line in text.splitlines(): + line = line.strip() + if not line: + continue + try: + rows.append(json.loads(line)) + except json.JSONDecodeError: + continue + return rows + + +def _find_call_dir(root: Path, call: str | None) -> Path | None: + """Locate a call directory under ``/calls`` (newest if ``call`` is None).""" + if call: + direct = Path(call) + if (direct / "metadata.json").exists(): + return direct + matches = list(root.glob(f"calls/**/{call}/metadata.json")) + return matches[0].parent if matches else None + metas = list(root.glob("calls/**/metadata.json")) + if not metas: + return None + newest = max(metas, key=lambda p: p.stat().st_mtime) + return newest.parent + + +def _load_call(call_dir: Path) -> dict: + meta: dict = {} + try: + meta = json.loads((call_dir / "metadata.json").read_text(encoding="utf-8")) + except (OSError, json.JSONDecodeError): + pass + return { + "dir": call_dir, + "metadata": meta, + "turns": _read_jsonl(call_dir / "transcript.jsonl"), + "events": _read_jsonl(call_dir / "events.jsonl"), + } + + +def _turn_latency(turn: dict) -> dict: + lat = turn.get("latency") + return lat if isinstance(lat, dict) else {} + + +def _classify_stages(call: dict) -> list[tuple[str, str, str]]: + """Return ``(stage, status, detail)`` for each pipeline stage of one call.""" + meta = call["metadata"] + turns = call["turns"] + events = call["events"] + + def any_turn(pred) -> bool: + return any(pred(t) for t in turns) + + has_stt = any_turn(lambda t: bool(t.get("user_text"))) or any_turn( + lambda t: (t.get("stt_audio_seconds") or 0) > 0 + ) + has_llm = any_turn(lambda t: bool(t.get("agent_text"))) or any_turn( + lambda t: (_turn_latency(t).get("llm_ms") or 0) > 0 + ) + has_tts = any_turn(lambda t: (t.get("tts_characters") or 0) > 0) or any_turn( + lambda t: (_turn_latency(t).get("tts_ms") or 0) > 0 + ) + bargeins = sum(1 for e in events if e.get("type") == "barge_in") + errors = [e for e in events if e.get("type") == "error"] + + out: list[tuple[str, str, str]] = [] + provider = meta.get("telephony_provider") or "?" + out.append( + ( + "Call reached Patter", + OK if meta else FAIL, + f"{meta.get('direction', '?')} via {provider}, status={meta.get('status', '?')}" + if meta + else "no metadata.json", + ) + ) + out.append( + ("Caller transcribed (STT)", OK if has_stt else FAIL, f"{len(turns)} turn(s)") + ) + out.append(("Hermes replied (LLM)", OK if has_llm else FAIL, "")) + out.append(("Spoken back (TTS)", OK if has_tts else FAIL, "")) + out.append( + ( + "Barge-in", + OK if bargeins else SKIP, + f"{bargeins} event(s)" if bargeins else "none recorded", + ) + ) + if errors or meta.get("error"): + detail = meta.get("error") or errors[0].get("data", {}) + out.append(("Errors", WARN, str(detail)[:120])) + return out + + +def _latency_summary(turns: list[dict]) -> str: + keys = ("stt_ms", "llm_ttft_ms", "llm_ms", "tts_ms", "total_ms") + sums: dict[str, float] = {k: 0.0 for k in keys} + counts: dict[str, int] = {k: 0 for k in keys} + for t in turns: + lat = _turn_latency(t) + for k in keys: + v = lat.get(k) + if isinstance(v, (int, float)) and v > 0: + sums[k] += v + counts[k] += 1 + parts = [ + f"{k}={int(sums[k] / counts[k])}ms" for k in keys if counts[k] + ] + return " ".join(parts) if parts else "no latency recorded" + + +def _diagnose_verdict(call: dict) -> tuple[str, str]: + """Decision tree → ``(verdict, suggested_fix)`` for the first broken stage.""" + stages = {name: status for name, status, _ in _classify_stages(call)} + if stages.get("Call reached Patter") == FAIL: + return ( + "No call record — the carrier webhook never reached Patter.", + "Check the number's voice webhook (patter hermes attach-number) and " + "that the tunnel was up.", + ) + if stages.get("Caller transcribed (STT)") == FAIL: + return ( + "Audio reached Patter but produced no transcript — STT/VAD stage.", + "Check DEEPGRAM_API_KEY, the STT language/model, and that media streamed.", + ) + if stages.get("Hermes replied (LLM)") == FAIL: + return ( + "Transcript captured but Hermes never replied — gateway/LLM stage.", + "Run `patter hermes test`; check the gateway is up and the key matches.", + ) + if stages.get("Spoken back (TTS)") == FAIL: + return ( + "Hermes replied but no audio was synthesized — TTS stage.", + "Check ELEVENLABS_API_KEY and use REST transport on PSTN " + "(PATTER_ELEVENLABS_TRANSPORT=rest).", + ) + return ("Pipeline looks healthy end-to-end.", "") + + +def _resolve_trace_call(args: argparse.Namespace) -> tuple[Path | None, str]: + """Shared resolution for trace/diagnose. Returns ``(call_dir, error_msg)``.""" + root = _call_log_root(getattr(args, "log_dir", None)) + if root is None: + return None, ( + "Call logging is off. Set PATTER_LOG_DIR (or pass --log-dir) so Patter " + "writes per-call logs, then place a call and retry." + ) + call_dir = _find_call_dir(root, getattr(args, "call", None)) + if call_dir is None: + which = getattr(args, "call", None) or "any call" + return None, f"No call log found for {which} under {root}/calls." + return call_dir, "" + + +def cmd_trace(args: argparse.Namespace) -> int: + _apply_env(args) + call_dir, errmsg = _resolve_trace_call(args) + if call_dir is None: + print(errmsg, file=sys.stderr) + return 2 + call = _load_call(call_dir) + stages = _classify_stages(call) + if getattr(args, "json", False): + print( + json.dumps( + { + "call_id": call["metadata"].get("call_id"), + "dir": str(call_dir), + "stages": [ + {"stage": n, "status": s, "detail": d} for n, s, d in stages + ], + "latency": _latency_summary(call["turns"]), + }, + indent=2, + ) + ) + return 0 + meta = call["metadata"] + print(f"Call {meta.get('call_id', call_dir.name)} ({call_dir})") + for name, status, detail in stages: + sym = _color(_SYMBOL.get(status, "?"), status) + line = f" {sym} {name}" + if detail: + line += f": {detail}" + print(line) + print(f"\n latency: {_latency_summary(call['turns'])}") + return 0 + + +def cmd_diagnose(args: argparse.Namespace) -> int: + _apply_env(args) + call_dir, errmsg = _resolve_trace_call(args) + if call_dir is None: + print(errmsg, file=sys.stderr) + return 2 + call = _load_call(call_dir) + verdict, fix = _diagnose_verdict(call) + if getattr(args, "json", False): + print( + json.dumps( + { + "call_id": call["metadata"].get("call_id"), + "verdict": verdict, + "fix": fix, + }, + indent=2, + ) + ) + return 0 + print(f"Call {call['metadata'].get('call_id', call_dir.name)}") + print(f" {verdict}") + if fix: + print(f" fix: {fix}") + return 0 + + # ────────────────────────────────────────────────────────────────────────── # Hermes gateway enablement (the one step that writes to ~/.hermes) # ────────────────────────────────────────────────────────────────────────── @@ -798,6 +1144,56 @@ def _enable_hermes_gateway() -> str: return key +def _start_gateway() -> bool: + """Start the Hermes gateway via the CLI. Returns True on success. + + Patter does not own the service — this is a convenience that shells out to + ``hermes gateway start`` when the CLI is available. + """ + if not shutil.which("hermes"): + print( + "Cannot start the gateway: hermes CLI not found. Start it your usual " + "way, then re-run with --no-network skipped to verify." + ) + return False + import subprocess + + print("Starting the Hermes gateway (hermes gateway start)…") + try: + proc = subprocess.run( + ["hermes", "gateway", "start"], + capture_output=True, + text=True, + timeout=60, + ) + except Exception as exc: # noqa: BLE001 + print(f"Could not start the gateway: {exc}") + return False + if proc.returncode == 0: + return True + print((proc.stderr or proc.stdout or "").strip()[:300]) + return False + + +def _wait_for_gateway( + base_url: str, key: str, *, timeout: float = 60.0, interval: float = 2.0 +) -> bool: + """Poll ``{base_url}/models`` until it answers 200 or the timeout elapses.""" + import time + + headers = {"Authorization": f"Bearer {key}"} if key else {} + deadline = time.monotonic() + timeout + print(f"Waiting for the gateway at {base_url} (up to {int(timeout)}s)…") + while time.monotonic() < deadline: + status, _body, _err = _get_json(f"{base_url}/models", headers=headers, timeout=3.0) + if status == 200: + print("✓ Gateway is ready.") + return True + time.sleep(interval) + print("✗ Gateway did not become ready in time.") + return False + + # ────────────────────────────────────────────────────────────────────────── # Twilio helpers # ────────────────────────────────────────────────────────────────────────── @@ -929,6 +1325,11 @@ def build_hermes_parser(subparsers: argparse._SubParsersAction) -> argparse.Argu action="store_true", help="Write API_SERVER_ENABLED=true (+ key) to ~/.hermes/.env (backed up)", ) + setup.add_argument( + "--start-gateway", + action="store_true", + help="Run `hermes gateway start` and wait for /v1/models readiness", + ) setup.add_argument( "--env-file", action="append", default=None, help="dotenv file(s) to load" ) @@ -936,6 +1337,27 @@ def build_hermes_parser(subparsers: argparse._SubParsersAction) -> argparse.Argu "--no-env-file", action="store_true", help="Do not autoload any .env file" ) + test = hsub.add_parser("test", help="Acceptance: gateway + a real chat turn + providers") + test.add_argument("--base-url", default=None, help="Hermes gateway base URL") + test.add_argument("--prompt", default=None, help="Prompt to send for the chat turn") + test.add_argument("--json", action="store_true", help="Machine-readable output") + test.add_argument("--env-file", action="append", default=None, help="dotenv file(s)") + test.add_argument("--no-env-file", action="store_true", help="Do not autoload .env") + + trace = hsub.add_parser("trace", help="Show the pipeline stages of a logged call") + trace.add_argument("call", nargs="?", default=None, help="call_id or dir (default: latest)") + trace.add_argument("--log-dir", default=None, help="Call log root (else PATTER_LOG_DIR)") + trace.add_argument("--json", action="store_true", help="Machine-readable output") + trace.add_argument("--env-file", action="append", default=None, help="dotenv file(s)") + trace.add_argument("--no-env-file", action="store_true", help="Do not autoload .env") + + diagnose = hsub.add_parser("diagnose", help="Classify where a logged call broke") + diagnose.add_argument("call", nargs="?", default=None, help="call_id or dir (default: latest)") + diagnose.add_argument("--log-dir", default=None, help="Call log root (else PATTER_LOG_DIR)") + diagnose.add_argument("--json", action="store_true", help="Machine-readable output") + diagnose.add_argument("--env-file", action="append", default=None, help="dotenv file(s)") + diagnose.add_argument("--no-env-file", action="store_true", help="Do not autoload .env") + attach = hsub.add_parser("attach-number", help="Point a Twilio number at your Patter URL") attach.add_argument("number", help="Phone number in E.164 (e.g. +15551234567)") attach.add_argument("--url", required=True, help="Public voice webhook URL (https)") @@ -952,12 +1374,19 @@ def dispatch_hermes(args: argparse.Namespace) -> int: return cmd_doctor(args) if command == "setup": return cmd_setup(args) + if command == "test": + return cmd_test(args) + if command == "trace": + return cmd_trace(args) + if command == "diagnose": + return cmd_diagnose(args) if command == "attach-number": return cmd_attach_number(args) if command == "numbers": return cmd_numbers(args) print( - "Usage: patter hermes {doctor|setup|attach-number|numbers}\n" + "Usage: patter hermes " + "{doctor|setup|test|trace|diagnose|attach-number|numbers}\n" "Try: patter hermes doctor", file=sys.stderr, ) diff --git a/libraries/python/tests/unit/test_hermes_cli.py b/libraries/python/tests/unit/test_hermes_cli.py index 5877dbd..0542562 100644 --- a/libraries/python/tests/unit/test_hermes_cli.py +++ b/libraries/python/tests/unit/test_hermes_cli.py @@ -300,3 +300,183 @@ def test_parser_wires_subcommands() -> None: assert ns.hermes_command == "attach-number" assert ns.number == "+15551234567" assert ns.url == "https://x/y" + + +def test_parser_wires_test_trace_diagnose() -> None: + parser = argparse.ArgumentParser() + sub = parser.add_subparsers(dest="command") + cli_hermes.build_hermes_parser(sub) + for name in ("test", "trace", "diagnose"): + ns = parser.parse_args(["hermes", name]) + assert ns.hermes_command == name + + +# ── gateway lifecycle ──────────────────────────────────────────────────────── +def test_start_gateway_no_cli(monkeypatch, capsys) -> None: + monkeypatch.setattr(cli_hermes.shutil, "which", lambda _n: None) + assert cli_hermes._start_gateway() is False + assert "hermes CLI not found" in capsys.readouterr().out + + +def test_start_gateway_success(monkeypatch) -> None: + monkeypatch.setattr(cli_hermes.shutil, "which", lambda _n: "/usr/bin/hermes") + + class Proc: + returncode = 0 + stdout = "started" + stderr = "" + + monkeypatch.setattr("subprocess.run", lambda *a, **k: Proc()) + assert cli_hermes._start_gateway() is True + + +def test_wait_for_gateway_ready(monkeypatch) -> None: + monkeypatch.setattr(cli_hermes, "_get_json", lambda *a, **k: (200, {}, "")) + assert cli_hermes._wait_for_gateway("http://x/v1", "k", timeout=1, interval=0.01) is True + + +def test_wait_for_gateway_times_out(monkeypatch) -> None: + monkeypatch.setattr(cli_hermes, "_get_json", lambda *a, **k: (None, None, "down")) + assert ( + cli_hermes._wait_for_gateway("http://x/v1", "k", timeout=0.05, interval=0.01) + is False + ) + + +# ── acceptance test command ────────────────────────────────────────────────── +def test_chat_turn_check_ok(monkeypatch) -> None: + class Resp: + status_code = 200 + + @staticmethod + def json(): + return {"choices": [{"message": {"content": "Hi there"}}]} + + monkeypatch.setattr("httpx.post", lambda *a, **k: Resp()) + c = cli_hermes._chat_turn_check("http://x/v1", "k", "hermes-agent", "hi") + assert c.status == cli_hermes.OK + assert "Hi there" in c.detail + + +def test_chat_turn_check_http_error(monkeypatch) -> None: + class Resp: + status_code = 500 + text = "boom" + + monkeypatch.setattr("httpx.post", lambda *a, **k: Resp()) + c = cli_hermes._chat_turn_check("http://x/v1", "k", "m", "hi") + assert c.status == cli_hermes.FAIL + + +def test_cmd_test_passes_when_gateway_and_turn_ok(monkeypatch, capsys) -> None: + monkeypatch.setenv("DEEPGRAM_API_KEY", "dg") + monkeypatch.setenv("ELEVENLABS_API_KEY", "el") + monkeypatch.setattr( + cli_hermes, "_get_json", lambda *a, **k: (200, {"data": [{"id": "hermes-agent"}]}, "") + ) + monkeypatch.setattr( + cli_hermes, + "_chat_turn_check", + lambda *a, **k: cli_hermes.Check(cli_hermes.OK, "Chat turn", "120 ms"), + ) + args = argparse.Namespace( + base_url=None, prompt=None, json=True, env_file=None, no_env_file=True + ) + assert cli_hermes.cmd_test(args) == 0 + assert '"failures": 0' in capsys.readouterr().out + + +# ── trace / diagnose ───────────────────────────────────────────────────────── +def _make_call(root: Path, call_id: str, *, meta: dict, turns: list[dict], events=None): + d = root / "calls" / "2026" / "06" / "09" / call_id + d.mkdir(parents=True, exist_ok=True) + (d / "metadata.json").write_text( + __import__("json").dumps({"call_id": call_id, **meta}), encoding="utf-8" + ) + (d / "transcript.jsonl").write_text( + "\n".join(__import__("json").dumps(t) for t in turns), encoding="utf-8" + ) + if events: + (d / "events.jsonl").write_text( + "\n".join(__import__("json").dumps(e) for e in events), encoding="utf-8" + ) + return d + + +def test_find_call_dir_latest_and_by_id(tmp_path: Path) -> None: + _make_call(tmp_path, "CA1", meta={"status": "x"}, turns=[{"user_text": "a"}]) + d2 = _make_call(tmp_path, "CA2", meta={"status": "x"}, turns=[{"user_text": "b"}]) + # latest by mtime is CA2 + assert cli_hermes._find_call_dir(tmp_path, None) == d2 + assert cli_hermes._find_call_dir(tmp_path, "CA1").name == "CA1" + assert cli_hermes._find_call_dir(tmp_path, "nope") is None + + +def test_classify_stages_healthy(tmp_path: Path) -> None: + d = _make_call( + tmp_path, + "CA", + meta={"status": "completed", "telephony_provider": "twilio"}, + turns=[{"user_text": "hi", "agent_text": "hello", "tts_characters": 10}], + events=[{"type": "barge_in", "data": {}}], + ) + stages = {n: s for n, s, _ in cli_hermes._classify_stages(cli_hermes._load_call(d))} + assert stages["Caller transcribed (STT)"] == cli_hermes.OK + assert stages["Hermes replied (LLM)"] == cli_hermes.OK + assert stages["Spoken back (TTS)"] == cli_hermes.OK + + +def test_diagnose_tts_stage(tmp_path: Path) -> None: + d = _make_call( + tmp_path, + "CA", + meta={"status": "completed"}, + turns=[{"user_text": "hi", "agent_text": "hello", "tts_characters": 0}], + ) + verdict, fix = cli_hermes._diagnose_verdict(cli_hermes._load_call(d)) + assert "TTS" in verdict + assert "ELEVENLABS_API_KEY" in fix + + +def test_diagnose_llm_stage(tmp_path: Path) -> None: + d = _make_call( + tmp_path, + "CA", + meta={"status": "completed"}, + turns=[{"user_text": "hi", "agent_text": "", "tts_characters": 0}], + ) + verdict, _fix = cli_hermes._diagnose_verdict(cli_hermes._load_call(d)) + assert "Hermes never replied" in verdict + + +def test_diagnose_stt_stage(tmp_path: Path) -> None: + d = _make_call( + tmp_path, "CA", meta={"status": "completed"}, turns=[{"user_text": ""}] + ) + verdict, _fix = cli_hermes._diagnose_verdict(cli_hermes._load_call(d)) + assert "no transcript" in verdict + + +def test_trace_no_log_dir(monkeypatch) -> None: + monkeypatch.delenv("PATTER_LOG_DIR", raising=False) + args = argparse.Namespace( + call=None, log_dir=None, json=False, env_file=None, no_env_file=True + ) + assert cli_hermes.cmd_trace(args) == 2 + + +def test_trace_json_output(tmp_path: Path, monkeypatch, capsys) -> None: + _make_call( + tmp_path, + "CA", + meta={"status": "completed", "telephony_provider": "twilio"}, + turns=[{"user_text": "hi", "agent_text": "yo", "tts_characters": 3, + "latency": {"llm_ttft_ms": 1000, "total_ms": 1500}}], + ) + args = argparse.Namespace( + call=None, log_dir=str(tmp_path), json=True, env_file=None, no_env_file=True + ) + assert cli_hermes.cmd_trace(args) == 0 + out = capsys.readouterr().out + assert '"call_id": "CA"' in out + assert "llm_ttft_ms" in out diff --git a/libraries/typescript/src/cli.ts b/libraries/typescript/src/cli.ts index 0d8fd77..3616ec2 100644 --- a/libraries/typescript/src/cli.ts +++ b/libraries/typescript/src/cli.ts @@ -45,11 +45,12 @@ function printEvalStub(): void { function printHermesStub(): void { console.log( - 'The Hermes setup wizard (doctor / setup / attach-number) lives in the\n' + - 'Python CLI today. Use it from the Python SDK:\n\n' + + 'The Hermes wizard (doctor / setup / test / trace / diagnose /\n' + + 'attach-number) lives in the Python CLI today. Use it from the Python SDK:\n\n' + ' pip install getpatter\n' + ' patter hermes doctor\n' + - ' patter hermes setup\n\n' + + ' patter hermes setup\n' + + ' patter hermes test\n\n' + 'The HermesLLM provider itself is fully available in this TypeScript SDK\n' + "(import { HermesLLM } from 'getpatter'). See\n" + 'https://docs.getpatter.com/integrations/hermes for docs.', From 1b575eef65b90b3265c21946b81c84ed3aa2d146 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 9 Jun 2026 11:32:18 +0000 Subject: [PATCH 11/11] docs(readme): condense telemetry note to a short opt-out callout https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts --- README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.md b/README.md index a012905..70f928f 100644 --- a/README.md +++ b/README.md @@ -132,6 +132,12 @@ cp .env.example .env # fill in your keys cd python && pip install -r requirements.txt && python main.py ``` +## Telemetry + +> **Note** Patter collects anonymous, opt-out usage data (SDK version, bucketed provider/model and call facts) to help us prioritise — never call content, prompts, phone numbers, keys, or free text. +> +> Opt out any time: `Patter(telemetry=False)` (`new Patter({ telemetry: false })`) or `PATTER_TELEMETRY_DISABLED=1` (also honours `DO_NOT_TRACK=1`); auto-off in CI/tests. Full details: [Telemetry](https://docs.getpatter.com/telemetry). + ## Star History