Add patter hermes/openclaw CLIs, session-key factory, echo guard, and long-turn filler by nicolotognoni · Pull Request #161 · PatterAI/Patter

nicolotognoni · 2026-06-09T20:34:17Z

Summary

This PR adds comprehensive zero-config setup wizards for both Hermes and OpenClaw voice shells, implements per-call session-key scoping for durable memory, adds echo detection to prevent TTS bleed from triggering false barge-ins, and introduces an opt-in long-turn filler to speak during slow agent-runtime turns. The changes span Python and TypeScript with full parity.

Changes

New CLI Wizards (`patter hermes` / `patter openclaw`)

cli_hermes.py / cli_openclaw.py: Zero-config setup wizards with doctor (preflight checks), setup (project scaffold), test (acceptance probe), and Twilio wiring (attach-number / numbers). OpenClaw variant adds agent-roster awareness and guards against pointing at the default/master agent.
_cli_common.py: Shared provider-agnostic helpers (Check/Section result model, dotenv parsing, Twilio integration, chat-turn acceptance probe) to prevent drift between the two wizards.
_hermes_scaffold.py / _openclaw_scaffold.py: Single source of truth for starter project files. OpenClaw scaffold is mode-aware (inbound/outbound/both) and powers both directions with one agent.

Session-Key Factory (Feature #7)

models.py: Added SessionContext dataclass (call_id, caller, callee, caller_hash) and hash_caller() for non-reversible per-caller memory scoping.
openai_compatible.py (Python/TS): New session_key_factory parameter that derives the session-key header value per call from SessionContext, enabling durable memory scoped per caller without raw numbers on the wire.
hermes.py (Python/TS): Convenience session_key_from="caller_hash" preset that installs the default factory.
llm_loop.py: Updated to thread caller / callee into provider.stream() alongside existing call_id, with per-provider signature detection to maintain backward compatibility.

Echo Detection & Dedup (Residual Hermes/OpenClaw Fixes)

stream_handler.py (Python/TS): Added _looks_like_echo() / looksLikeEcho() to detect agent TTS bleeding into STT (substring match or high word-overlap), preventing false barge-ins on no-AEC links. Added _is_near_duplicate() / isNearDuplicate() for back-to-back dedup.
New unit tests: test_pipeline_echo_dedup.py (Python) and pipeline-echo-dedup.mocked.test.ts (TS) verify echo guard and dedup logic.

Long-Turn Filler (Feature #8)

client.py: New long_turn_message / long_turn_message_after_s agent parameters for opt-in filler during slow LLM turns (e.g., agent runtime running tools).
stream_handler.py: Filler task scheduling / cancellation in pipeline mode; speaks via the same per-sentence TTS primitive.
New unit tests: test_long_turn_filler.py (Python) and long-turn-filler.mocked.test.ts (TS) exercise real LLMLoop + filler scheduling with mocked provider timing.

Multi-Turn Turn-Taking Fixes

stream_handler.py: Fixed tail-grace misclassification (VAD speech_start during echo-tail grace period now rescues the new turn instead of mis-detecting as barge-in).
New unit tests: test_pipeline_multiturn_tail_grace.py (Python) and pipeline-multiturn-tail-grace.mocked.test.ts (TS) reproduce and verify the live-call failure where subsequent turns went silent.

Backgrounded Barge-In

https://claude.ai/code/session_018Xbkscpu4DBb8cCRCj4zqV

… long-turn filler Three opt-in developer-experience improvements for the agent-LLM providers, full Python/TypeScript parity. - TypeScript namespace exports: `import { hermes, openclaw, openaiCompatible }` -> `new hermes.LLM()`, mirroring Python's `from getpatter.llm import hermes`. Frozen objects. - session_key_factory / sessionKeyFactory + session_key_from="caller_hash": derive the X-Hermes-Session-Key per call from a SHA-256 caller hash (new public SessionContext + hash_caller/hashCaller), so an agent runtime remembers a caller across calls WITHOUT the raw phone number ever reaching the wire or the logs. The factory takes precedence over the static session_key; a falsy return omits the header. The loop dispatch was generalised to thread caller/callee only to providers whose stream() declares them (or **kwargs) — built-in and minimal custom providers unchanged. An unknown session_key_from raises in both SDKs (parity). - long_turn_message / longTurnMessage (+ _after_s, default 4 s): opt-in spoken filler when a turn is slow and no audio has reached the caller yet — distinct from llm_error_message (which fires on error). Fires once, gated on emitted audio; the TS timer is serialised via an async clear() that awaits an in-flight filler so it can never overlap the real sentence. Adversarial review caught and fixed a TS filler double-speak race (the setTimeout callback could overlap the first real sentence; Python's asyncio path was immune). Python 2206 / TypeScript 1758 tests pass; tsc + build clean.

… per-turn cancel-event reset Live-call bug: in pipeline mode (Twilio + Deepgram STT + ElevenLabs TTS + an agent-LLM provider) the FIRST turn worked end-to-end but every subsequent turn went silent, leaving a ghost metrics turn of user_text='' agent_text='[interrupted]'. Two root causes in the pipeline turn-taking state machine, full Python/TypeScript parity: 1. Tail-grace misclassified the next turn as a barge-in. After the agent finishes TTS, _end_speaking_with_grace / endSpeakingWithGrace keeps _is_speaking=true for PATTER_TTS_TAIL_GRACE_MS (default 1500 ms) to swallow the fading echo tail. Humans reply in 200-700 ms — inside that window — so the user's next utterance was detected as a barge-in: it recorded an interrupted turn and the leading audio was withheld from STT (only a <=260 ms echo-contaminated ring), so no final transcript was produced and the agent never answered. New _tail_grace_active / tailGraceActive flag distinguishes "actively streaming TTS" from "post-TTS echo guard". A VAD speech_start OR a transcript during the tail grace now ends the grace and dispatches as a clean NEW turn via _end_tail_grace_for_new_turn / endTailGraceForNewTurn — recovering the leading audio from the ring instead of dropping it, with no spurious send_clear / record_turn_interrupted. Real barge-in during active TTS (tail_grace_active=false) is unchanged. The Python grace flip task is now tracked and cancelled (parity with TS clearGraceTimer) so at most one is in flight. 2. (Python) A barge-in's per-turn cancel event leaked into the next turn. _llm_cancel_event was recreated inside _process_streaming_response — AFTER LLMLoop.run had already captured the previous (still-set) event for the next turn — so the turn after any real barge-in bailed immediately. The reset moved to the top of _dispatch_turn, before dispatch; the event object is now stable through a turn (generator and consumption loop share it). TypeScript already allocates a fresh AbortController per turn in runPipelineLlm. Tests: new test_pipeline_multiturn_tail_grace.py (6) + pipeline-multiturn-tail-grace.mocked.test.ts (4) reproduce the bug and assert the rescue, the flag lifecycle, the active-TTS barge-in regression guard, and the fresh cancel-event. Python 2212 / TypeScript 1763 pass; tsc + build clean. Adversarial review: 0 critical / 0 high.

…t-token abort (Hermes/OpenClaw) The caller could not interrupt the agent mid-response. The STT receive loop awaited the turn's LLM+TTS dispatch inline (`await self._dispatch_turn(...)` / `await this.runPipelineLlm(...)`), so during a long (30-90 s) Hermes/OpenClaw tool-running turn it stopped reading transcripts — a barge-in transcript ("ferma") was only processed AFTER the turn ended. On PSTN with echo-masked/unreliable VAD, the transcript path is the only barge-in fallback and it was structurally dead. Three coordinated changes, full Python/TypeScript parity: 1. Decoupled single-in-flight dispatch. The turn runs as one tracked background task (_dispatch_task / dispatchTask) so the receive loop keeps draining transcripts and runs handleBargeIn against the LIVE turn. The loop settles the previous dispatch before launching the next (single-in-flight), so conversation_history / metrics ordering is unchanged; the loop still awaits the final turn to settle before returning, so existing tests that inspect state right after the loop are unaffected. 2. Prompt pre-first-token abort (Python). Agent runtimes run tools for tens of seconds before the first token, during which the per-chunk cancel_event check never runs. The provider now races create()+first-byte against the cancel signal and spawns a watchdog that close()s the response the instant a barge-in fires (TS already aborts promptly via fetch + AbortController). The VAD legacy barge-in branch now also sets _llm_cancel_event (previously it only flipped _is_speaking, which Hermes never observed pre-first-token), and the OpenAI-compatible client uses an explicit httpx read/connect timeout so a dead gateway fails fast. 3. PATTER_FORWARD_STT_WHILE_SPEAKING (opt-in, default off). Forwards inbound audio to STT during TTS even with a VAD configured, so the transcript barge-in path can receive a transcript on echo-masked links where the VAD never fires. The leading-edge ring is still captured. Echo caveat (WARN on enable): without AEC the agent's own voice may be transcribed as a phantom interruption — pair with agent.barge_in_strategies. Default behaviour (flag off, VAD present, normal LLM) is byte-identical; the just-landed tail-grace multi-turn fix is preserved. Tests: new test_pipeline_bargein_backgrounded.py (4), test_provider_prefirsttoken_abort.py (3), pipeline-bargein-backgrounded.mocked.test.ts (2). Python 2219 / TypeScript 1765 pass; tsc + build clean.

…-on-cancel, bounded teardown Three defects found by adversarial review of the previous commit's decoupled-dispatch barge-in, all fixed with full Python/TS parity: 1. (HIGH, TS) Per-turn history was passed to the LLM by LIVE reference. With the turn dispatch backgrounded, a following transcript's user push (on the drain loop while the turn is in flight) could land in the in-flight turn's prompt before buildMessages read it — conflating two turns. Now a history SNAPSHOT is captured at launch and threaded through dispatchTurn → runPipelineLlm → llmLoop.run (and the onMessage/webhook paths), mirroring Python's list(self.conversation_history). Regression test added. 2. (MEDIUM, Python) On cleanup/hangup hard-cancel while the provider was parked pre-first-token, asyncio.wait did not cancel the in-flight create() POST, orphaning the Hermes/OpenClaw connection ("Task exception was never retrieved"). _open_stream_with_cancel now catches CancelledError and aborts the create task. Test added. 3. (MEDIUM, TS) handleStop/handleWsClose awaited the backgrounded dispatch with no timeout — a hung user onMessage (no AbortSignal) could block call teardown indefinitely. Teardown now bounds the wait via settleDispatchForTeardown (DISPATCH_SETTLE_TIMEOUT_MS = 30s); Python hard-cancels the task. Python 2220 / TypeScript 1766 pass; tsc + build clean.

…ollow-up, mark interrupted turns Residual Hermes/OpenClaw barge-in failure (live test, PATTER_FORWARD_STT_WHILE_SPEAKING=1, no AEC, no barge_in_strategies): barge-in fired on a PHANTOM transcript ("che tu l'hai" — the agent's own Italian TTS echoing into Deepgram, not caught by the English-only hallucination filter), the real follow-up was dropped leaving an empty [interrupted] turn, and the post-barge-in context was poisoned. A workflow root-cause (code trace + web research: Coval/Pipecat/LiveKit/Azure) confirmed this is NOT an interruptibility problem — the abort already works (bargein_ms=1.0). It is a GATE + ECHO + CONTEXT-REWRITE problem. Fixes, full Python/TS parity: 1. Echo guard (language-agnostic). Track the agent's in-flight spoken text (_current_agent_spoken_text / currentAgentSpokenText). A new _looks_like_echo / looksLikeEcho (substring OR >=60% word overlap) drops any barge-in (_handle_barge_in) or commit (_commit_transcript) that is the agent's own TTS echoing back. Active ONLY while _forward_stt_while_speaking, so the default VAD path and real post-turn replies are unaffected. 2. Back-to-back dedup fix. The <500ms drop now applies only to a NEAR-DUPLICATE of the previous final (Deepgram speech_final+is_final for the same utterance), via _is_near_duplicate / isNearDuplicate. A genuinely different fast follow-up is no longer swallowed into an empty [interrupted] turn. 3. Interrupted-turn context rewrite. On a confirmed mid-turn barge-in the spoken prefix is appended to history with an "[interrupted by caller]" marker, so a stateful agent runtime (Hermes/OpenClaw, X-Hermes-Session-Id) sees next turn that it was cut off and what the caller actually heard. Plus: fixed the stale _can_barge_in docstring (0.25 -> 0.5 s no-AEC gate). Recommended caller config (unchanged SDK defaults): barge_in_strategies= (MinWordsStrategy(min_words=2),), echo_cancellation=True. Tests: test_pipeline_echo_dedup.py (19) + pipeline-echo-dedup.mocked.test.ts (11); updated the back-to-back dedup tests to the corrected behaviour. Python 2236 / TypeScript 1777 pass; tsc + build clean.

…t replies, word-boundary dedup, clean interrupted metrics Adversarial review of the echo-safe barge-in commit found three real HIGH false-positive risks; all fixed with full Python/TS parity: 1. (HIGH) Echo guard could silently drop a legitimate SHORT caller answer that repeats the agent's offered words (e.g. agent "lunedì o martedì?", caller "lunedì" → substring match → dropped, caller goes unheard). Real TTS echo is a long near-complete fragment, not a 1-3 word reply. The echo guard now requires >= _ECHO_MIN_CANDIDATE_WORDS (4) words before classifying a candidate as echo, so short answers are never dropped. (Short echo blips on a no-AEC link are left to AEC / barge_in_strategies.) 2. (HIGH) Back-to-back dedup used a character-level substring test, so a genuinely different short follow-up was dropped ("no" matched inside "nothing else") — and this ran on the DEFAULT path (not gated on the echo flag), affecting all pipeline users. _is_near_duplicate / isNearDuplicate is now word-boundary aware (equal, or a true word-prefix double-emit), so "nothing else" is no longer a duplicate of "no" while Deepgram's speech_final+is_final pair still de-duplicates. 3. (HIGH, TS) The interrupted-turn "[interrupted by caller]" marker leaked into metrics: runPipelineLlm returned the marked text and dispatchTurn fed it to recordTtsComplete/recordTurnComplete. runPipelineLlm now returns { text, interrupted }; dispatchTurn records metrics on the PLAIN text (gated on !interrupted) and applies the marker to the history/transcript only — mirroring Python, where metrics are recorded before the marker is appended. Tests updated to the corrected behaviour (>=4-word echo examples + explicit short-answer-exemption + word-boundary dedup cases). Python 2237 / TypeScript 1779 pass; tsc + build clean.

… VAD cancel to transcript On a no-AEC link with PATTER_FORWARD_STT_WHILE_SPEAKING and no barge_in_strategies, a VAD speech_start during TTS cancelled the turn immediately. But that speech_start is very often the agent's own TTS echo (or pre-first-token line noise on a long tool-running Hermes/OpenClaw turn), so the agent self-interrupted almost every turn: a short normal reply "bene bene" produced agent_text='[interrupted]', and the next turn ran the LLM for seconds yet emitted tts_characters=0 (torn down before its first token). The echo guard only protected the transcript path; the raw VAD-energy cancel had none. Defer the VAD-energy cancel to transcript confirmation whenever forward_stt_while_speaking && aec is None — exactly as it already worked when barge_in_strategies are configured. The speech_start now marks the barge-in PENDING (agent keeps talking); the cancel fires only on a real transcript that survives the echo guard, else the agent resumes after barge_in_confirm_ms (default 1500ms). Default VAD path and forward-STT WITH AEC keep the responsive immediate cancel — no behaviour change for existing configs. Full Python/TS parity. New tests drive the VAD path through on_audio_received / handleAudio: no-AEC+no-strategies defers to pending; AEC on still cancels immediately; a real transcript confirms, an echo transcript does not.

…ch-number + example app Make standing up the Hermes voice shell (Direction A) copy-paste simple, on par with wiring a hosted custom-LLM voice agent but keeping Hermes on loopback. New `patter hermes` CLI group (Python): - `doctor` — preflight across the Hermes gateway (/v1/models reachability + model presence), the Patter providers (HermesLLM constructible, Deepgram / ElevenLabs keys, ElevenLabs transport, Silero VAD), and the Twilio carrier (creds valid, number webhook). Each problem prints a suggested fix. `--no-network` skips live probes, `--json` for machine-readable output. - `setup` — scaffold a ready-to-run hermes-phone-agent project, run the checks, optionally attach a Twilio number (`--number`/`--url`). Non-interactive with `--yes`. - `attach-number` / `numbers` — point a Twilio number's voice webhook at your Patter URL / list account numbers. Scaffold (`getpatter/_hermes_scaffold.py`) is the single source of truth for the committed `examples/hermes-phone-agent/` project (app.py, .env.example, README, docker-compose, doctor/text-turn/outbound-call scripts); a test keeps them in sync. The example defaults to REST ElevenLabs TTS and caller-hash memory. TS CLI gains a `hermes` stub pointing to the Python wizard (mirrors the `eval` stub); the HermesLLM provider stays available in both SDKs. Docs updated with a zero-config setup section. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts

…mes detection, key-gen, --enable-hermes Address the review gaps on the Hermes wizard: it now reads and (opt-in) writes real config instead of only consulting os.environ. doctor: - Autoloads dotenv files before checking — ~/.hermes/.env then the project/cwd .env (non-overriding), with --env-file/--no-env-file to control it. Loaded paths are reported; secrets are never echoed. - Reads ~/.hermes/.env + config.yaml directly: reports API_SERVER_ENABLED, surfaces the configured key/port/model, and runs `hermes gateway status` when the CLI is present. - Sharper severity: CLI missing AND gateway unreachable is now a failure, not a soft warning; gateway-down fix adapts to whether the CLI is available. setup: - --enable-hermes writes API_SERVER_ENABLED=true (and generates an API_SERVER_KEY if absent) into ~/.hermes/.env, backing up to .env.bak first, then reminds the operator to restart the gateway. - --generate-key writes a strong key into the project .env; when used with --enable-hermes the SAME key is mirrored so Patter and Hermes agree (a mismatch is a 401 at call time). - Autoloads env for the preflight so checks reflect the project's .env. New helpers (_parse_env_file / _upsert_env_file / _load_env_files / _read_hermes_config / _enable_hermes_gateway / _generate_key), no new deps. +11 unit tests; docs updated. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts

… trace/diagnose Close the acceptance + debugging gaps so a green run means a real call works. setup: - --start-gateway runs `hermes gateway start` then polls /v1/models until the gateway answers, completing the enable → start → verify cycle. New `patter hermes test` — acceptance, not just preflight: GET /v1/models, send a real /v1/chat/completions turn with the X-Hermes-Session-Id header and report the latency + reply snippet, confirm HermesLLM is constructible, and check the STT/TTS keys. Exit non-zero on any blocker. New `patter hermes trace [call]` / `diagnose [call]` — read the on-disk per-call log (PATTER_LOG_DIR; services/call_log.py) and classify the pipeline stage by stage (carrier → STT → Hermes → TTS), with a latency breakdown. `diagnose` applies a decision tree and names the first broken stage with a fix, e.g. "Hermes replied but no audio — TTS stage. Check ELEVENLABS_API_KEY / REST transport." Defaults to the latest call; accepts a call_id or a directory. Note: item #3 (auto-attach the tunnel URL to the carrier) is already handled by the SDK — serve() auto-configures the Twilio/Plivo webhook once the tunnel is up (server.py) — so the scaffold app does it on `python app.py`; documented. Scaffold now sets PATTER_LOG_DIR and documents test/trace/diagnose; example dir regenerated. TS CLI stub lists the new subcommands. +15 unit tests; docs updated. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts

https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts

Bring in the anonymous opt-out telemetry work (schema v5: stack/cost/install-id, deploy-shape, feature-adoption, upgrade funnel, CLI usage, call funnel, and the `getpatter telemetry status|disable|enable` command) that landed on main after this branch was cut. Conflicts resolved: - cli.py / cli.ts: keep both the `hermes` wizard and the `telemetry` command. - CHANGELOG.md: telemetry "Added" entries placed under Added, above Fixed. - README.md: replaced the long "Anonymous Telemetry" section with the short opt-out note requested earlier, and removed the duplicate section. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts

…example Mirror the Hermes DX layer for OpenClaw's multi-agent gateway. One scoped OpenClaw agent is the brain; Patter is the voice shell for both directions (phone.serve = inbound receptionist, a supervised phone.call loop = outbound 24/7 dialer). - cli_openclaw.py: doctor/setup/test/call/agents/attach-number/numbers, with agent enumeration, a default/master-agent guard, JSON5 endpoint enable, a day-1 security section, and inbound/outbound modes. - _openclaw_scaffold.py + examples/openclaw-phone-agent/: app.py (serve), dialer.py (supervised 24/7 call loop), a shared agent.py builder, and deploy/ systemd + launchd units for always-on operation. - _cli_common.py: factor the provider-agnostic CLI helpers out of cli_hermes (shared, no behaviour change; the 39 Hermes CLI tests stay green). - cli.ts: a `getpatter openclaw` stub (OpenClawLLM runtime is already in both SDKs). - docs/integrations/openclaw.mdx: a zero-config CLI quickstart (both flows). - tests: test_openclaw_cli.py (27). https://claude.ai/code/session_018Xbkscpu4DBb8cCRCj4zqV

# Conflicts: # CHANGELOG.md # libraries/python/getpatter/llm/openai_compatible.py # libraries/python/getpatter/stream_handler.py # libraries/typescript/src/stream-handler.ts # libraries/typescript/tests/long-turn-filler.mocked.test.ts

nicolotognoni and others added 14 commits June 5, 2026 19:28

docs(readme): condense telemetry note to a short opt-out callout

1b575ee

https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts

mintlify Bot deployed to staging - docs June 9, 2026 21:07 View deployment

nicolotognoni mentioned this pull request Jun 9, 2026

Feature #6–#8: Hermes CLI, session-key factory, long-turn filler, echo guard #162

Closed

nicolotognoni merged commit 5a3d89c into main Jun 9, 2026
10 checks passed

github-actions Bot deleted the claude/nice-planck-bv96no branch June 10, 2026 08:09

FrancescoRosciano mentioned this pull request Jun 10, 2026

chore(release): 0.6.6 #166

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add patter hermes/openclaw CLIs, session-key factory, echo guard, and long-turn filler#161

Add patter hermes/openclaw CLIs, session-key factory, echo guard, and long-turn filler#161
nicolotognoni merged 14 commits into
mainfrom
claude/nice-planck-bv96no

nicolotognoni commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nicolotognoni commented Jun 9, 2026

Summary

Changes

New CLI Wizards (patter hermes / patter openclaw)

Session-Key Factory (Feature #7)

Echo Detection & Dedup (Residual Hermes/OpenClaw Fixes)

Long-Turn Filler (Feature #8)

Multi-Turn Turn-Taking Fixes

Backgrounded Barge-In

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New CLI Wizards (`patter hermes` / `patter openclaw`)