Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

### Added

- **TypeScript namespace exports for the agent-LLM presets.** `import { hermes, openclaw, openaiCompatible } from "getpatter"` now works alongside the existing `HermesLLM` / `OpenClawLLM` / `OpenAICompatibleLLM` named exports, so `new hermes.LLM()` mirrors Python's `from getpatter.llm import hermes; hermes.LLM()`. `libraries/typescript/src/index.ts`.
- **`session_key_factory` / `sessionKeyFactory` — per-call long-term memory scope from a caller hash.** `OpenAICompatibleLLM` (and `HermesLLM`) can derive the `X-Hermes-Session-Key` header per call from a `SessionContext` (`call_id` / `caller` / `callee` / `caller_hash`) instead of a static value, so an agent runtime can remember a caller across calls **without the raw phone number ever reaching the wire or the logs**. Shortcut `HermesLLM(session_key_from="caller_hash")` installs a default `patter-caller-<caller_hash>` factory (SHA-256, 16 hex chars). New public `SessionContext` + `hash_caller` / `hashCaller` helper. The factory takes precedence over the static `session_key`; a falsy return omits the header. The loop dispatch was generalised to thread `caller` / `callee` only to providers whose `stream()` declares them (or `**kwargs`), keeping built-in and minimal custom providers unchanged. `libraries/python/getpatter/models.py`, `.../llm/openai_compatible.py`, `.../llm/hermes.py`, `.../services/llm_loop.py` + TypeScript mirrors.
- **`long_turn_message` / `longTurnMessage` — opt-in spoken filler during a slow turn.** When an LLM turn takes longer than `long_turn_message_after_s` (default 4 s) and no audio has reached the caller yet, Patter speaks a short configurable line (e.g. "One moment, let me check.") instead of dead silence — useful for agent runtimes (Hermes / OpenClaw) that run tools mid-turn. Distinct from `llm_error_message` (which fires on error): this fires on **slowness**, once per turn, gated on emitted audio so it never double-speaks. `None` / unset = off (no behaviour change). `libraries/python/getpatter/models.py`, `.../stream_handler.py`, `.../client.py` + TypeScript mirrors.
- **Anonymous usage telemetry (opt-out, on by default).** Patter now sends a
small, anonymous, fail-safe usage event when the SDK is initialised and when
an engine family is first used, so the maintainers can see which engines,
Expand Down Expand Up @@ -69,6 +72,21 @@
`libraries/python/getpatter/cli.py` / `telemetry/{events,consent,install_id,call_metrics}.py`
and the TypeScript mirrors; both SDKs verified byte-for-byte at parity.

### Fixed

- **Multi-turn pipeline conversations no longer go silent after the first turn.** The agent answered the first turn but then ignored every subsequent utterance, leaving a ghost metrics turn of `user_text='' agent_text='[interrupted]'`. Two root causes in the pipeline turn-taking state machine:
- **Tail-grace misclassified the next turn as a barge-in.** After the agent finishes speaking, `_end_speaking_with_grace` keeps `_is_speaking=true` for `PATTER_TTS_TAIL_GRACE_MS` (default 1500 ms) to swallow the fading TTS echo tail. Humans reply in 200-700 ms — inside that window — so the user's next utterance was treated as a barge-in: it recorded an interrupted turn and the leading audio was withheld from STT (only a ≤260 ms echo-contaminated ring), so no final transcript was produced and the agent never answered. A new `_tail_grace_active` / `tailGraceActive` flag now distinguishes "actively streaming TTS" from "post-TTS echo guard"; a VAD `speech_start` (or a transcript) during the tail grace ends the grace and is dispatched as a clean new turn — recovering the leading audio from the ring instead of dropping it — with no spurious `record_turn_interrupted`. Tunable `PATTER_TTS_TAIL_GRACE_MS` (0 / 200 / 1500) is now safe for fast next-turn speech.
- **(Python) A barge-in's per-turn cancel event leaked into the next turn.** `_llm_cancel_event` was only recreated *inside* `_process_streaming_response` — after `LLMLoop.run` had already been handed the (still-set) event for the next turn — so the turn following any real barge-in bailed immediately. The event is now recreated at the top of `_dispatch_turn`, before dispatch (TypeScript already allocated a fresh `AbortController` per turn). `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`.
- **Pipeline barge-in now works DURING a turn — including long Hermes/OpenClaw tool-running turns.** The caller could not interrupt the agent mid-response: the STT receive loop awaited the turn's LLM+TTS dispatch inline (`await self._dispatch_turn(...)` / `await this.runPipelineLlm(...)`), so for the whole 30-90 s of a tool-running agent-runtime turn it stopped reading transcripts — a barge-in transcript was only processed *after* the turn ended ("ferma" → answered late). Three coordinated changes, full Python/TS parity:
- **Decoupled, single-in-flight dispatch.** The turn now runs as one tracked background task (`_dispatch_task` / `dispatchTask`) so the receive loop keeps draining transcripts and runs barge-in detection against the LIVE turn. Exactly one dispatch is in flight: the loop settles the previous one before launching the next, so `conversation_history` / metrics ordering is unchanged. With no barge-in (default, VAD present, normal LLM) behaviour is unchanged — the loop still awaits the final turn to settle before returning.
- **Prompt pre-first-token abort (Python).** Agent runtimes run tools for tens of seconds before the first token, during which the per-chunk `cancel_event` check never runs. The provider now races `create()` + first-byte against the cancel signal and spawns a watchdog that `close()`s the response the instant a barge-in fires, so the request is torn down immediately instead of blocking the next turn (TS already aborts promptly via `fetch` + `AbortController`). The VAD legacy barge-in branch now also sets `_llm_cancel_event` (it previously only flipped `_is_speaking`), and the OpenAI-compatible client uses an explicit httpx read/connect timeout so a dead gateway fails fast.
- **`PATTER_FORWARD_STT_WHILE_SPEAKING` (opt-in, default off).** Forwards inbound audio to STT during TTS even with a VAD configured, so the transcript barge-in path can receive a transcript on echo-masked PSTN links where the VAD never fires. The leading-edge ring buffer is still captured. **Echo caveat:** without AEC the agent's own voice may be transcribed as a phantom interruption — pair with `agent.barge_in_strategies`. `libraries/python/getpatter/stream_handler.py`, `.../services/llm_loop.py`, `.../llm/openai_compatible.py`, `libraries/typescript/src/stream-handler.ts`.
- **Echo-safe barge-in: the agent no longer interrupts itself, and a fast real follow-up is no longer lost.** Hardening for the echo-prone agent-runtime case (`PATTER_FORWARD_STT_WHILE_SPEAKING` on, no AEC), where the agent's own TTS bled into STT and was transcribed (e.g. a garbled fragment in another language not covered by the English hallucination filter), firing a phantom barge-in and leaving an empty `[interrupted]` turn:
- **Echo guard** — a language-agnostic check (`_looks_like_echo` / `looksLikeEcho`: substring or ≥60% word overlap against the agent's in-flight spoken text) now drops any candidate barge-in/commit that is the agent's own speech echoing back. Active only while forwarding audio during TTS, so the default VAD path and real post-turn replies are untouched.
- **Back-to-back dedup fix** — a final within 500 ms of the previous is now dropped only when it is a *near-duplicate* (Deepgram emitting `speech_final` then `is_final` for the same utterance). A genuinely different fast follow-up (e.g. the real interruption right after a suppressed phantom) is kept instead of being silently swallowed into an empty turn.
- **Interrupted-turn context rewrite** — on a confirmed mid-turn barge-in the spoken prefix is recorded in history with an `[interrupted by caller]` marker (instead of an ungrounded full reply), so a stateful agent runtime (Hermes/OpenClaw, keyed by `X-Hermes-Session-Id`) sees on the next turn that it was cut off and what the caller actually heard. `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`.
- **Forward-STT-without-AEC no longer self-interrupts on its own echo.** The remaining live Hermes/OpenClaw barge-in failure: with `PATTER_FORWARD_STT_WHILE_SPEAKING` on, no AEC, and no `barge_in_strategies`, a VAD `speech_start` during TTS cancelled the turn immediately — but on a no-AEC link that `speech_start` is very often the agent's *own* TTS echo (or pre-first-token line noise during a long tool-running turn). The result was a cascade of false-positive interruptions: a short normal reply like "bene bene" produced `agent_text='[interrupted]'` with `bargein_ms≈0`, and the next turn's LLM ran for seconds but emitted `tts_characters=0` because it was torn down before its first token. The echo guard existed only on the *transcript* path, so the raw VAD-energy cancel had no protection. The VAD-energy cancel is now **deferred to transcript confirmation** whenever audio is forwarded during TTS without AEC (`forward_stt_while_speaking && aec is None`), exactly as it already was when `barge_in_strategies` are configured: the `speech_start` marks the barge-in *pending* (the agent keeps talking) and the cancel only fires once `_handle_barge_in` / `handleBargeIn` sees a real transcript that survives the echo guard; if none confirms within `barge_in_confirm_ms` (default 1500 ms) the agent resumes its sentence. The default VAD path and forward-STT *with* AEC keep the responsive immediate cancel — no behaviour change for existing configs. For the cleanest short-echo handling, still pair with `echo_cancellation=True` or `barge_in_strategies`. `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`.

## 0.6.5 (2026-06-05)

### Added
Expand Down
25 changes: 4 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,28 +110,11 @@ await phone.serve(agent, tunnel=True)

`tunnel: true` spawns a Cloudflare quick tunnel and points your number at it — ideal for local dev. For production, use a static `webhook_url` (or [ngrok](https://ngrok.com)); see [Tunneling](https://docs.getpatter.com).

## Anonymous Telemetry
## Telemetry

Patter collects **completely anonymous**, **opt-out** usage telemetry so the maintainers can see how the SDK is used in aggregate — which engines, providers, models, and carriers people choose — and prioritise accordingly. It is on by default, following the open-source norm (Next.js, Astro, Homebrew). **No data we collect is personally identifiable**, and none of it ever contains call content.

**What we collect** (coarse and bucketed):

- SDK version, language (Python/TS), OS family, CPU arch, and runtime version.
- A random anonymous install id (a UUID, not tied to you) and a per-run id, plus the upgrade funnel (previous → current version) and a first-run activation marker.
- Deploy shape: container / serverless / cloud / package-manager presence, and whether an AI coding agent invoked the SDK.
- The composed stack — provider vendor and a sanitized model token per layer (e.g. `anthropic-claude-haiku-4-5`, `deepgram-nova-3`).
- Agent shape: bucketed tool counts, integration category, and which coarse features are enabled.
- CLI commands invoked (the command name only) and per-call facts: inbound vs outbound, outcome, error code (the code, never the message), duration, latency, cost, and a bucketed turn count.

**What we never collect:** phone numbers, transcripts, audio, prompts, tool arguments, API keys, customer identifiers, IPs (dropped at the collector), hostnames, file paths, or any free text. Custom or self-hosted model names and custom tool names are structurally impossible to send — they collapse to a vendor bucket or `other` before anything leaves the process.

**Opt out** anytime — any one of:

- `Patter(telemetry=False)` / `new Patter({ telemetry: false })`
- `getpatter telemetry disable` (persisted; re-enable with `getpatter telemetry enable`)
- `PATTER_TELEMETRY_DISABLED=1`, or the cross-tool standard `DO_NOT_TRACK=1`

It is auto-disabled in CI and test runs. **Inspect exactly what would be sent, without sending it:** set `PATTER_TELEMETRY_DEBUG=1` (prints each event to stderr and sends nothing), or run `getpatter telemetry status`. Full details: [Telemetry](https://docs.getpatter.com/telemetry).
> **Note** Patter collects anonymous, opt-out usage data (SDK version, bucketed provider/model and call facts) to help us prioritise — never call content, prompts, phone numbers, keys, or free text.
>
> Opt out any time: `Patter(telemetry=False)` (`new Patter({ telemetry: false })`), `getpatter telemetry disable`, or `PATTER_TELEMETRY_DISABLED=1` (also honours `DO_NOT_TRACK=1`); auto-off in CI/tests. Inspect without sending: `PATTER_TELEMETRY_DEBUG=1`. Full details: [Telemetry](https://docs.getpatter.com/telemetry).

## Templates

Expand Down
86 changes: 86 additions & 0 deletions docs/integrations/hermes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,23 @@ single turn can take **30–90 s**. That is why `HermesLLM` defaults to a **120
timeout (the generic provider's 60 s, raised for the preset) instead of the short ceiling
used for raw inference providers — a turn that runs a tool isn't cut off mid-thought.

Because a tool-running turn can leave the caller in **silence** for several seconds, the
agent supports an opt-in spoken **filler**: set `long_turn_message` / `longTurnMessage`
(with `long_turn_message_after_s` / `longTurnMessageAfterS`, default 4 s) and Patter speaks
a short line if no audio has reached the caller yet by then. It fires once per turn, only
on slowness, and never overlaps the real reply. (A separate `llm_error_message` /
`llmErrorMessage` covers the gateway-down / timeout **error** case.)

```python
agent = phone.agent(
stt=DeepgramSTT(),
llm=HermesLLM(),
tts=ElevenLabsTTS(),
long_turn_message="One moment, let me check that.",
long_turn_message_after_s=4,
)
```

<Note>
**Where the session lives.** Hermes is **stateless** and keys continuity off
**HTTP headers**, not the OpenAI `user` field. Each phone call maps to **one** Hermes
Expand All @@ -83,6 +100,15 @@ used for raw inference providers — a turn that runs a tool isn't cut off mid-t
const llm = new HermesLLM({ sessionKey: 'customer-42' });
```

For **per-caller memory without storing the raw phone number**, derive the key from a
caller hash instead of a static value — `HermesLLM(session_key_from="caller_hash")` /
`new HermesLLM({ sessionKeyFrom: 'caller_hash' })` emits
`X-Hermes-Session-Key: patter-caller-<hash>` (SHA-256, 16 hex chars), so Hermes
remembers a caller across calls while the raw number never reaches the wire or the
logs. For a custom scheme, pass `session_key_factory` / `sessionKeyFactory`, a callback
that receives a `SessionContext` (`call_id` / `caller` / `callee` / `caller_hash`) and
returns the scope value (a falsy return omits the header for that call).

(Patter also still sends `user=patter-call-<call_id>` for upstream-log correlation,
but that field is **not** what drives the Hermes session — the headers are.)
</Note>
Expand Down Expand Up @@ -138,6 +164,66 @@ gateway that isn't listening.
`hermes-agent`).
</Note>

### Zero-config setup (Python)

If you'd rather not wire it up by hand, the Python CLI scaffolds a ready-to-run project,
checks your environment, and can point your Twilio number at Patter:

```bash
pip install getpatter

patter hermes doctor # preflight: gateway, providers, carrier — with fixes
patter hermes setup # scaffold ./hermes-phone-agent (app.py, .env, scripts)
```

`patter hermes doctor` reads your Hermes config directly — it autoloads `~/.hermes/.env`
and the nearest project `.env`, reports whether `API_SERVER_ENABLED` is set and which
gateway port is configured, runs `hermes gateway status` when the CLI is present, then
probes the gateway (`/v1/models`), confirms `HermesLLM` is constructible, and checks your
Deepgram / ElevenLabs / Twilio credentials — printing a suggested fix for anything missing
(`--no-network` skips live probes, `--json` for machine-readable output, `--env-file` /
`--no-env-file` to control autoloading).

`patter hermes setup` writes the same starter project shown in
[`examples/hermes-phone-agent`](https://github.com/PatterAI/Patter/tree/main/examples/hermes-phone-agent)
and can also wire the two ends together for you:

- `--enable-hermes` writes `API_SERVER_ENABLED=true` (and generates an `API_SERVER_KEY` if
absent) into `~/.hermes/.env`, backing the file up first — then reminds you to restart the
gateway. The same key is mirrored into the project `.env` so Patter and Hermes agree (a
mismatch is a 401 at call time).
- `--generate-key` puts a strong `API_SERVER_KEY` into the project `.env`.
- `--number` + `--url` attach the Twilio webhook in the same run.

To wire an existing number on its own:

```bash
patter hermes numbers # list the numbers on your Twilio account
patter hermes attach-number +15551234567 --url https://<your-tunnel>/calls/inbound
```

To go from a freshly enabled gateway to a verified one in a single run, add
`--start-gateway` — `setup` then runs `hermes gateway start` and waits for `/v1/models` to
answer before continuing. Before placing a real call, run the end-to-end acceptance check,
which sends an actual chat turn through the gateway (with the Hermes session header) and
confirms your providers are ready:

```bash
patter hermes test # /v1/models + a real /v1/chat/completions turn + provider keys
```

When a call misbehaves, point Patter's per-call log (`PATTER_LOG_DIR`) at the tracer to see
exactly which stage broke — carrier → STT → Hermes → TTS — with a latency breakdown and a
one-line verdict:

```bash
patter hermes trace # latest call's pipeline stages + stt/llm/tts latency
patter hermes diagnose # e.g. "Hermes replied but no audio — TTS stage" + the fix
```

These commands live in the Python SDK today; the `HermesLLM` provider itself is available
in both the Python and TypeScript SDKs.

### Running Patter locally

Build a pipeline-mode agent whose LLM is `HermesLLM`. Patter wraps the carrier, STT, and
Expand Down
23 changes: 23 additions & 0 deletions examples/hermes-phone-agent/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# ── Hermes gateway (the brain — keep it on loopback) ──────────────────
API_SERVER_ENABLED=true
API_SERVER_HOST=127.0.0.1
API_SERVER_PORT=8642
API_SERVER_KEY=choose-a-strong-key
API_SERVER_MODEL_NAME=hermes-agent

# ── Patter (the voice shell) ──────────────────────────────────────────
PATTER_PHONE_NUMBER=+15551234567
PATTER_LANGUAGE=en
# REST is the safer default for a first PSTN demo; set to ws for streaming.
PATTER_ELEVENLABS_TRANSPORT=rest
# Per-call logs — enables `patter hermes trace` / `patter hermes diagnose`.
PATTER_LOG_DIR=./patter-logs

# ── Twilio carrier ────────────────────────────────────────────────────
TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_AUTH_TOKEN=your-twilio-auth-token

# ── STT / TTS providers ───────────────────────────────────────────────
DEEPGRAM_API_KEY=your-deepgram-key
ELEVENLABS_API_KEY=your-elevenlabs-key
# ELEVENLABS_VOICE_ID=EXAVITQu4vr4xnSDxMaL
Loading
Loading