feat(llm): Hermes DX — TS namespace exports, caller-hash session key, long-turn filler by nicolotognoni · Pull Request #159 · PatterAI/Patter

nicolotognoni · 2026-06-05T17:28:40Z

Summary

Three opt-in developer-experience improvements for the agent-as-primary-LLM providers (shipped in 0.6.5), full Python/TypeScript parity. No core behaviour changes — every new field is optional and defaults to today's behaviour.

Implementation

TypeScript namespace exports (src/index.ts): import { hermes, openclaw, openaiCompatible } from "getpatter" → new hermes.LLM(), mirroring Python's from getpatter.llm import hermes; hermes.LLM(). Frozen objects; the named HermesLLM/OpenClawLLM/OpenAICompatibleLLM exports remain.
Session key from a caller hash (#7): new public SessionContext + hash_caller / hashCaller helper (SHA-256, 16 hex chars). OpenAICompatibleLLM/HermesLLM gain session_key_factory / sessionKeyFactory and the shortcut session_key_from="caller_hash" → emits X-Hermes-Session-Key: patter-caller-<hash>, so a runtime remembers a caller across calls without the raw phone number ever reaching the wire or the logs. The factory takes precedence over the static session_key; a falsy return omits the header. The loop dispatch was generalised to thread caller/callee only to providers whose stream() declares them (or **kwargs) — built-in and minimal custom providers are unchanged. An unknown session_key_from raises in both SDKs.
Long-turn filler (#8): long_turn_message / longTurnMessage (+ _after_s, default 4 s). When a turn is slow (agent runtime running tools) and no audio has reached the caller yet, Patter speaks a short configurable line instead of dead silence. Distinct from llm_error_message (which fires on error). Fires once, gated on emitted audio; the TS timer is serialised via an async clear() that awaits an in-flight filler so it can never overlap the real reply.

Breaking change?

No — all new fields optional with defaults that preserve current behaviour.

Test plan

Python: pytest tests/ — 2206 passed (+23: test_llm_session_key_factory.py, tests/unit/test_long_turn_filler.py)
TypeScript: npm test (1758 passed) + npm run lint (tsc clean) + npm run build
Adversarial review pass: caught + fixed a TS filler double-speak race (the setTimeout callback could overlap the first real sentence — Python's asyncio path was immune), a Py/TS parity gap on session_key_from validation, and mutable namespace objects (now Object.freezed).

Docs updates

docs/integrations/hermes.mdx — caller-hash session-key scoping + the long-turn filler.

Follow-ups (not in this PR)

CLI init / doctor / run / call hermes — land on the init-wizard branch (the 36 KB wizard lives there).
Outbound patter_call tool for Hermes — via the patter-mcp server.

… long-turn filler Three opt-in developer-experience improvements for the agent-LLM providers, full Python/TypeScript parity. - TypeScript namespace exports: `import { hermes, openclaw, openaiCompatible }` -> `new hermes.LLM()`, mirroring Python's `from getpatter.llm import hermes`. Frozen objects. - session_key_factory / sessionKeyFactory + session_key_from="caller_hash": derive the X-Hermes-Session-Key per call from a SHA-256 caller hash (new public SessionContext + hash_caller/hashCaller), so an agent runtime remembers a caller across calls WITHOUT the raw phone number ever reaching the wire or the logs. The factory takes precedence over the static session_key; a falsy return omits the header. The loop dispatch was generalised to thread caller/callee only to providers whose stream() declares them (or **kwargs) — built-in and minimal custom providers unchanged. An unknown session_key_from raises in both SDKs (parity). - long_turn_message / longTurnMessage (+ _after_s, default 4 s): opt-in spoken filler when a turn is slow and no audio has reached the caller yet — distinct from llm_error_message (which fires on error). Fires once, gated on emitted audio; the TS timer is serialised via an async clear() that awaits an in-flight filler so it can never overlap the real sentence. Adversarial review caught and fixed a TS filler double-speak race (the setTimeout callback could overlap the first real sentence; Python's asyncio path was immune). Python 2206 / TypeScript 1758 tests pass; tsc + build clean.

… per-turn cancel-event reset Live-call bug: in pipeline mode (Twilio + Deepgram STT + ElevenLabs TTS + an agent-LLM provider) the FIRST turn worked end-to-end but every subsequent turn went silent, leaving a ghost metrics turn of user_text='' agent_text='[interrupted]'. Two root causes in the pipeline turn-taking state machine, full Python/TypeScript parity: 1. Tail-grace misclassified the next turn as a barge-in. After the agent finishes TTS, _end_speaking_with_grace / endSpeakingWithGrace keeps _is_speaking=true for PATTER_TTS_TAIL_GRACE_MS (default 1500 ms) to swallow the fading echo tail. Humans reply in 200-700 ms — inside that window — so the user's next utterance was detected as a barge-in: it recorded an interrupted turn and the leading audio was withheld from STT (only a <=260 ms echo-contaminated ring), so no final transcript was produced and the agent never answered. New _tail_grace_active / tailGraceActive flag distinguishes "actively streaming TTS" from "post-TTS echo guard". A VAD speech_start OR a transcript during the tail grace now ends the grace and dispatches as a clean NEW turn via _end_tail_grace_for_new_turn / endTailGraceForNewTurn — recovering the leading audio from the ring instead of dropping it, with no spurious send_clear / record_turn_interrupted. Real barge-in during active TTS (tail_grace_active=false) is unchanged. The Python grace flip task is now tracked and cancelled (parity with TS clearGraceTimer) so at most one is in flight. 2. (Python) A barge-in's per-turn cancel event leaked into the next turn. _llm_cancel_event was recreated inside _process_streaming_response — AFTER LLMLoop.run had already captured the previous (still-set) event for the next turn — so the turn after any real barge-in bailed immediately. The reset moved to the top of _dispatch_turn, before dispatch; the event object is now stable through a turn (generator and consumption loop share it). TypeScript already allocates a fresh AbortController per turn in runPipelineLlm. Tests: new test_pipeline_multiturn_tail_grace.py (6) + pipeline-multiturn-tail-grace.mocked.test.ts (4) reproduce the bug and assert the rescue, the flag lifecycle, the active-TTS barge-in regression guard, and the fresh cancel-event. Python 2212 / TypeScript 1763 pass; tsc + build clean. Adversarial review: 0 critical / 0 high.

nicolotognoni added 2 commits June 5, 2026 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): Hermes DX — TS namespace exports, caller-hash session key, long-turn filler#159

feat(llm): Hermes DX — TS namespace exports, caller-hash session key, long-turn filler#159
nicolotognoni wants to merge 2 commits into
mainfrom
feat/hermes-dx

nicolotognoni commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicolotognoni commented Jun 5, 2026

Summary

Implementation

Breaking change?

Test plan

Docs updates

Follow-ups (not in this PR)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant