Add patter hermes/openclaw CLIs, session-key factory, echo guard, and long-turn filler#161
Merged
Conversation
… long-turn filler
Three opt-in developer-experience improvements for the agent-LLM providers, full
Python/TypeScript parity.
- TypeScript namespace exports: `import { hermes, openclaw, openaiCompatible }` ->
`new hermes.LLM()`, mirroring Python's `from getpatter.llm import hermes`. Frozen objects.
- session_key_factory / sessionKeyFactory + session_key_from="caller_hash": derive the
X-Hermes-Session-Key per call from a SHA-256 caller hash (new public SessionContext +
hash_caller/hashCaller), so an agent runtime remembers a caller across calls WITHOUT the
raw phone number ever reaching the wire or the logs. The factory takes precedence over
the static session_key; a falsy return omits the header. The loop dispatch was
generalised to thread caller/callee only to providers whose stream() declares them (or
**kwargs) — built-in and minimal custom providers unchanged. An unknown session_key_from
raises in both SDKs (parity).
- long_turn_message / longTurnMessage (+ _after_s, default 4 s): opt-in spoken filler when
a turn is slow and no audio has reached the caller yet — distinct from llm_error_message
(which fires on error). Fires once, gated on emitted audio; the TS timer is serialised
via an async clear() that awaits an in-flight filler so it can never overlap the real
sentence.
Adversarial review caught and fixed a TS filler double-speak race (the setTimeout callback
could overlap the first real sentence; Python's asyncio path was immune).
Python 2206 / TypeScript 1758 tests pass; tsc + build clean.
… per-turn cancel-event reset Live-call bug: in pipeline mode (Twilio + Deepgram STT + ElevenLabs TTS + an agent-LLM provider) the FIRST turn worked end-to-end but every subsequent turn went silent, leaving a ghost metrics turn of user_text='' agent_text='[interrupted]'. Two root causes in the pipeline turn-taking state machine, full Python/TypeScript parity: 1. Tail-grace misclassified the next turn as a barge-in. After the agent finishes TTS, _end_speaking_with_grace / endSpeakingWithGrace keeps _is_speaking=true for PATTER_TTS_TAIL_GRACE_MS (default 1500 ms) to swallow the fading echo tail. Humans reply in 200-700 ms — inside that window — so the user's next utterance was detected as a barge-in: it recorded an interrupted turn and the leading audio was withheld from STT (only a <=260 ms echo-contaminated ring), so no final transcript was produced and the agent never answered. New _tail_grace_active / tailGraceActive flag distinguishes "actively streaming TTS" from "post-TTS echo guard". A VAD speech_start OR a transcript during the tail grace now ends the grace and dispatches as a clean NEW turn via _end_tail_grace_for_new_turn / endTailGraceForNewTurn — recovering the leading audio from the ring instead of dropping it, with no spurious send_clear / record_turn_interrupted. Real barge-in during active TTS (tail_grace_active=false) is unchanged. The Python grace flip task is now tracked and cancelled (parity with TS clearGraceTimer) so at most one is in flight. 2. (Python) A barge-in's per-turn cancel event leaked into the next turn. _llm_cancel_event was recreated inside _process_streaming_response — AFTER LLMLoop.run had already captured the previous (still-set) event for the next turn — so the turn after any real barge-in bailed immediately. The reset moved to the top of _dispatch_turn, before dispatch; the event object is now stable through a turn (generator and consumption loop share it). TypeScript already allocates a fresh AbortController per turn in runPipelineLlm. Tests: new test_pipeline_multiturn_tail_grace.py (6) + pipeline-multiturn-tail-grace.mocked.test.ts (4) reproduce the bug and assert the rescue, the flag lifecycle, the active-TTS barge-in regression guard, and the fresh cancel-event. Python 2212 / TypeScript 1763 pass; tsc + build clean. Adversarial review: 0 critical / 0 high.
…t-token abort (Hermes/OpenClaw)
The caller could not interrupt the agent mid-response. The STT receive loop
awaited the turn's LLM+TTS dispatch inline (`await self._dispatch_turn(...)` /
`await this.runPipelineLlm(...)`), so during a long (30-90 s) Hermes/OpenClaw
tool-running turn it stopped reading transcripts — a barge-in transcript ("ferma")
was only processed AFTER the turn ended. On PSTN with echo-masked/unreliable VAD,
the transcript path is the only barge-in fallback and it was structurally dead.
Three coordinated changes, full Python/TypeScript parity:
1. Decoupled single-in-flight dispatch. The turn runs as one tracked background
task (_dispatch_task / dispatchTask) so the receive loop keeps draining
transcripts and runs handleBargeIn against the LIVE turn. The loop settles
the previous dispatch before launching the next (single-in-flight), so
conversation_history / metrics ordering is unchanged; the loop still awaits
the final turn to settle before returning, so existing tests that inspect
state right after the loop are unaffected.
2. Prompt pre-first-token abort (Python). Agent runtimes run tools for tens of
seconds before the first token, during which the per-chunk cancel_event
check never runs. The provider now races create()+first-byte against the
cancel signal and spawns a watchdog that close()s the response the instant a
barge-in fires (TS already aborts promptly via fetch + AbortController). The
VAD legacy barge-in branch now also sets _llm_cancel_event (previously it
only flipped _is_speaking, which Hermes never observed pre-first-token), and
the OpenAI-compatible client uses an explicit httpx read/connect timeout so a
dead gateway fails fast.
3. PATTER_FORWARD_STT_WHILE_SPEAKING (opt-in, default off). Forwards inbound
audio to STT during TTS even with a VAD configured, so the transcript
barge-in path can receive a transcript on echo-masked links where the VAD
never fires. The leading-edge ring is still captured. Echo caveat (WARN on
enable): without AEC the agent's own voice may be transcribed as a phantom
interruption — pair with agent.barge_in_strategies.
Default behaviour (flag off, VAD present, normal LLM) is byte-identical; the
just-landed tail-grace multi-turn fix is preserved.
Tests: new test_pipeline_bargein_backgrounded.py (4), test_provider_prefirsttoken_abort.py (3),
pipeline-bargein-backgrounded.mocked.test.ts (2). Python 2219 / TypeScript 1765 pass;
tsc + build clean.
…-on-cancel, bounded teardown
Three defects found by adversarial review of the previous commit's
decoupled-dispatch barge-in, all fixed with full Python/TS parity:
1. (HIGH, TS) Per-turn history was passed to the LLM by LIVE reference. With the
turn dispatch backgrounded, a following transcript's user push (on the drain
loop while the turn is in flight) could land in the in-flight turn's prompt
before buildMessages read it — conflating two turns. Now a history SNAPSHOT
is captured at launch and threaded through dispatchTurn → runPipelineLlm →
llmLoop.run (and the onMessage/webhook paths), mirroring Python's
list(self.conversation_history). Regression test added.
2. (MEDIUM, Python) On cleanup/hangup hard-cancel while the provider was parked
pre-first-token, asyncio.wait did not cancel the in-flight create() POST,
orphaning the Hermes/OpenClaw connection ("Task exception was never
retrieved"). _open_stream_with_cancel now catches CancelledError and aborts
the create task. Test added.
3. (MEDIUM, TS) handleStop/handleWsClose awaited the backgrounded dispatch with
no timeout — a hung user onMessage (no AbortSignal) could block call teardown
indefinitely. Teardown now bounds the wait via settleDispatchForTeardown
(DISPATCH_SETTLE_TIMEOUT_MS = 30s); Python hard-cancels the task.
Python 2220 / TypeScript 1766 pass; tsc + build clean.
…ollow-up, mark interrupted turns
Residual Hermes/OpenClaw barge-in failure (live test, PATTER_FORWARD_STT_WHILE_SPEAKING=1,
no AEC, no barge_in_strategies): barge-in fired on a PHANTOM transcript ("che tu
l'hai" — the agent's own Italian TTS echoing into Deepgram, not caught by the
English-only hallucination filter), the real follow-up was dropped leaving an
empty [interrupted] turn, and the post-barge-in context was poisoned.
A workflow root-cause (code trace + web research: Coval/Pipecat/LiveKit/Azure)
confirmed this is NOT an interruptibility problem — the abort already works
(bargein_ms=1.0). It is a GATE + ECHO + CONTEXT-REWRITE problem. Fixes, full
Python/TS parity:
1. Echo guard (language-agnostic). Track the agent's in-flight spoken text
(_current_agent_spoken_text / currentAgentSpokenText). A new _looks_like_echo
/ looksLikeEcho (substring OR >=60% word overlap) drops any barge-in
(_handle_barge_in) or commit (_commit_transcript) that is the agent's own
TTS echoing back. Active ONLY while _forward_stt_while_speaking, so the
default VAD path and real post-turn replies are unaffected.
2. Back-to-back dedup fix. The <500ms drop now applies only to a NEAR-DUPLICATE
of the previous final (Deepgram speech_final+is_final for the same
utterance), via _is_near_duplicate / isNearDuplicate. A genuinely different
fast follow-up is no longer swallowed into an empty [interrupted] turn.
3. Interrupted-turn context rewrite. On a confirmed mid-turn barge-in the spoken
prefix is appended to history with an "[interrupted by caller]" marker, so a
stateful agent runtime (Hermes/OpenClaw, X-Hermes-Session-Id) sees next turn
that it was cut off and what the caller actually heard.
Plus: fixed the stale _can_barge_in docstring (0.25 -> 0.5 s no-AEC gate).
Recommended caller config (unchanged SDK defaults): barge_in_strategies=
(MinWordsStrategy(min_words=2),), echo_cancellation=True.
Tests: test_pipeline_echo_dedup.py (19) + pipeline-echo-dedup.mocked.test.ts (11);
updated the back-to-back dedup tests to the corrected behaviour. Python 2236 /
TypeScript 1777 pass; tsc + build clean.
…t replies, word-boundary dedup, clean interrupted metrics
Adversarial review of the echo-safe barge-in commit found three real HIGH
false-positive risks; all fixed with full Python/TS parity:
1. (HIGH) Echo guard could silently drop a legitimate SHORT caller answer that
repeats the agent's offered words (e.g. agent "lunedì o martedì?", caller
"lunedì" → substring match → dropped, caller goes unheard). Real TTS echo is
a long near-complete fragment, not a 1-3 word reply. The echo guard now
requires >= _ECHO_MIN_CANDIDATE_WORDS (4) words before classifying a
candidate as echo, so short answers are never dropped. (Short echo blips on a
no-AEC link are left to AEC / barge_in_strategies.)
2. (HIGH) Back-to-back dedup used a character-level substring test, so a
genuinely different short follow-up was dropped ("no" matched inside
"nothing else") — and this ran on the DEFAULT path (not gated on the echo
flag), affecting all pipeline users. _is_near_duplicate / isNearDuplicate is
now word-boundary aware (equal, or a true word-prefix double-emit), so
"nothing else" is no longer a duplicate of "no" while Deepgram's
speech_final+is_final pair still de-duplicates.
3. (HIGH, TS) The interrupted-turn "[interrupted by caller]" marker leaked into
metrics: runPipelineLlm returned the marked text and dispatchTurn fed it to
recordTtsComplete/recordTurnComplete. runPipelineLlm now returns
{ text, interrupted }; dispatchTurn records metrics on the PLAIN text (gated
on !interrupted) and applies the marker to the history/transcript only —
mirroring Python, where metrics are recorded before the marker is appended.
Tests updated to the corrected behaviour (>=4-word echo examples + explicit
short-answer-exemption + word-boundary dedup cases). Python 2237 / TypeScript
1779 pass; tsc + build clean.
… VAD cancel to transcript On a no-AEC link with PATTER_FORWARD_STT_WHILE_SPEAKING and no barge_in_strategies, a VAD speech_start during TTS cancelled the turn immediately. But that speech_start is very often the agent's own TTS echo (or pre-first-token line noise on a long tool-running Hermes/OpenClaw turn), so the agent self-interrupted almost every turn: a short normal reply "bene bene" produced agent_text='[interrupted]', and the next turn ran the LLM for seconds yet emitted tts_characters=0 (torn down before its first token). The echo guard only protected the transcript path; the raw VAD-energy cancel had none. Defer the VAD-energy cancel to transcript confirmation whenever forward_stt_while_speaking && aec is None — exactly as it already worked when barge_in_strategies are configured. The speech_start now marks the barge-in PENDING (agent keeps talking); the cancel fires only on a real transcript that survives the echo guard, else the agent resumes after barge_in_confirm_ms (default 1500ms). Default VAD path and forward-STT WITH AEC keep the responsive immediate cancel — no behaviour change for existing configs. Full Python/TS parity. New tests drive the VAD path through on_audio_received / handleAudio: no-AEC+no-strategies defers to pending; AEC on still cancels immediately; a real transcript confirms, an echo transcript does not.
…ch-number + example app Make standing up the Hermes voice shell (Direction A) copy-paste simple, on par with wiring a hosted custom-LLM voice agent but keeping Hermes on loopback. New `patter hermes` CLI group (Python): - `doctor` — preflight across the Hermes gateway (/v1/models reachability + model presence), the Patter providers (HermesLLM constructible, Deepgram / ElevenLabs keys, ElevenLabs transport, Silero VAD), and the Twilio carrier (creds valid, number webhook). Each problem prints a suggested fix. `--no-network` skips live probes, `--json` for machine-readable output. - `setup` — scaffold a ready-to-run hermes-phone-agent project, run the checks, optionally attach a Twilio number (`--number`/`--url`). Non-interactive with `--yes`. - `attach-number` / `numbers` — point a Twilio number's voice webhook at your Patter URL / list account numbers. Scaffold (`getpatter/_hermes_scaffold.py`) is the single source of truth for the committed `examples/hermes-phone-agent/` project (app.py, .env.example, README, docker-compose, doctor/text-turn/outbound-call scripts); a test keeps them in sync. The example defaults to REST ElevenLabs TTS and caller-hash memory. TS CLI gains a `hermes` stub pointing to the Python wizard (mirrors the `eval` stub); the HermesLLM provider stays available in both SDKs. Docs updated with a zero-config setup section. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts
…mes detection, key-gen, --enable-hermes Address the review gaps on the Hermes wizard: it now reads and (opt-in) writes real config instead of only consulting os.environ. doctor: - Autoloads dotenv files before checking — ~/.hermes/.env then the project/cwd .env (non-overriding), with --env-file/--no-env-file to control it. Loaded paths are reported; secrets are never echoed. - Reads ~/.hermes/.env + config.yaml directly: reports API_SERVER_ENABLED, surfaces the configured key/port/model, and runs `hermes gateway status` when the CLI is present. - Sharper severity: CLI missing AND gateway unreachable is now a failure, not a soft warning; gateway-down fix adapts to whether the CLI is available. setup: - --enable-hermes writes API_SERVER_ENABLED=true (and generates an API_SERVER_KEY if absent) into ~/.hermes/.env, backing up to .env.bak first, then reminds the operator to restart the gateway. - --generate-key writes a strong key into the project .env; when used with --enable-hermes the SAME key is mirrored so Patter and Hermes agree (a mismatch is a 401 at call time). - Autoloads env for the preflight so checks reflect the project's .env. New helpers (_parse_env_file / _upsert_env_file / _load_env_files / _read_hermes_config / _enable_hermes_gateway / _generate_key), no new deps. +11 unit tests; docs updated. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts
… trace/diagnose Close the acceptance + debugging gaps so a green run means a real call works. setup: - --start-gateway runs `hermes gateway start` then polls /v1/models until the gateway answers, completing the enable → start → verify cycle. New `patter hermes test` — acceptance, not just preflight: GET /v1/models, send a real /v1/chat/completions turn with the X-Hermes-Session-Id header and report the latency + reply snippet, confirm HermesLLM is constructible, and check the STT/TTS keys. Exit non-zero on any blocker. New `patter hermes trace [call]` / `diagnose [call]` — read the on-disk per-call log (PATTER_LOG_DIR; services/call_log.py) and classify the pipeline stage by stage (carrier → STT → Hermes → TTS), with a latency breakdown. `diagnose` applies a decision tree and names the first broken stage with a fix, e.g. "Hermes replied but no audio — TTS stage. Check ELEVENLABS_API_KEY / REST transport." Defaults to the latest call; accepts a call_id or a directory. Note: item #3 (auto-attach the tunnel URL to the carrier) is already handled by the SDK — serve() auto-configures the Twilio/Plivo webhook once the tunnel is up (server.py) — so the scaffold app does it on `python app.py`; documented. Scaffold now sets PATTER_LOG_DIR and documents test/trace/diagnose; example dir regenerated. TS CLI stub lists the new subcommands. +15 unit tests; docs updated. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts
Bring in the anonymous opt-out telemetry work (schema v5: stack/cost/install-id, deploy-shape, feature-adoption, upgrade funnel, CLI usage, call funnel, and the `getpatter telemetry status|disable|enable` command) that landed on main after this branch was cut. Conflicts resolved: - cli.py / cli.ts: keep both the `hermes` wizard and the `telemetry` command. - CHANGELOG.md: telemetry "Added" entries placed under Added, above Fixed. - README.md: replaced the long "Anonymous Telemetry" section with the short opt-out note requested earlier, and removed the duplicate section. https://claude.ai/code/session_01TNysNGx7woXM99fHBjpsts
…example Mirror the Hermes DX layer for OpenClaw's multi-agent gateway. One scoped OpenClaw agent is the brain; Patter is the voice shell for both directions (phone.serve = inbound receptionist, a supervised phone.call loop = outbound 24/7 dialer). - cli_openclaw.py: doctor/setup/test/call/agents/attach-number/numbers, with agent enumeration, a default/master-agent guard, JSON5 endpoint enable, a day-1 security section, and inbound/outbound modes. - _openclaw_scaffold.py + examples/openclaw-phone-agent/: app.py (serve), dialer.py (supervised 24/7 call loop), a shared agent.py builder, and deploy/ systemd + launchd units for always-on operation. - _cli_common.py: factor the provider-agnostic CLI helpers out of cli_hermes (shared, no behaviour change; the 39 Hermes CLI tests stay green). - cli.ts: a `getpatter openclaw` stub (OpenClawLLM runtime is already in both SDKs). - docs/integrations/openclaw.mdx: a zero-config CLI quickstart (both flows). - tests: test_openclaw_cli.py (27). https://claude.ai/code/session_018Xbkscpu4DBb8cCRCj4zqV
# Conflicts: # CHANGELOG.md # libraries/python/getpatter/llm/openai_compatible.py # libraries/python/getpatter/stream_handler.py # libraries/typescript/src/stream-handler.ts # libraries/typescript/tests/long-turn-filler.mocked.test.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds comprehensive zero-config setup wizards for both Hermes and OpenClaw voice shells, implements per-call session-key scoping for durable memory, adds echo detection to prevent TTS bleed from triggering false barge-ins, and introduces an opt-in long-turn filler to speak during slow agent-runtime turns. The changes span Python and TypeScript with full parity.
Changes
New CLI Wizards (
patter hermes/patter openclaw)cli_hermes.py/cli_openclaw.py: Zero-config setup wizards withdoctor(preflight checks),setup(project scaffold),test(acceptance probe), and Twilio wiring (attach-number/numbers). OpenClaw variant adds agent-roster awareness and guards against pointing at the default/master agent._cli_common.py: Shared provider-agnostic helpers (Check/Section result model, dotenv parsing, Twilio integration, chat-turn acceptance probe) to prevent drift between the two wizards._hermes_scaffold.py/_openclaw_scaffold.py: Single source of truth for starter project files. OpenClaw scaffold is mode-aware (inbound/outbound/both) and powers both directions with one agent.Session-Key Factory (Feature #7)
models.py: AddedSessionContextdataclass (call_id, caller, callee, caller_hash) andhash_caller()for non-reversible per-caller memory scoping.openai_compatible.py(Python/TS): Newsession_key_factoryparameter that derives the session-key header value per call from SessionContext, enabling durable memory scoped per caller without raw numbers on the wire.hermes.py(Python/TS): Conveniencesession_key_from="caller_hash"preset that installs the default factory.llm_loop.py: Updated to threadcaller/calleeintoprovider.stream()alongside existingcall_id, with per-provider signature detection to maintain backward compatibility.Echo Detection & Dedup (Residual Hermes/OpenClaw Fixes)
stream_handler.py(Python/TS): Added_looks_like_echo()/looksLikeEcho()to detect agent TTS bleeding into STT (substring match or high word-overlap), preventing false barge-ins on no-AEC links. Added_is_near_duplicate()/isNearDuplicate()for back-to-back dedup.test_pipeline_echo_dedup.py(Python) andpipeline-echo-dedup.mocked.test.ts(TS) verify echo guard and dedup logic.Long-Turn Filler (Feature #8)
client.py: Newlong_turn_message/long_turn_message_after_sagent parameters for opt-in filler during slow LLM turns (e.g., agent runtime running tools).stream_handler.py: Filler task scheduling / cancellation in pipeline mode; speaks via the same per-sentence TTS primitive.test_long_turn_filler.py(Python) andlong-turn-filler.mocked.test.ts(TS) exercise real LLMLoop + filler scheduling with mocked provider timing.Multi-Turn Turn-Taking Fixes
stream_handler.py: Fixed tail-grace misclassification (VAD speech_start during echo-tail grace period now rescues the new turn instead of mis-detecting as barge-in).test_pipeline_multiturn_tail_grace.py(Python) andpipeline-multiturn-tail-grace.mocked.test.ts(TS) reproduce and verify the live-call failure where subsequent turns went silent.Backgrounded Barge-In
https://claude.ai/code/session_018Xbkscpu4DBb8cCRCj4zqV