feat(llm): Hermes DX — TS namespace exports, caller-hash session key, long-turn filler#159
Open
nicolotognoni wants to merge 2 commits into
Open
feat(llm): Hermes DX — TS namespace exports, caller-hash session key, long-turn filler#159nicolotognoni wants to merge 2 commits into
nicolotognoni wants to merge 2 commits into
Conversation
… long-turn filler
Three opt-in developer-experience improvements for the agent-LLM providers, full
Python/TypeScript parity.
- TypeScript namespace exports: `import { hermes, openclaw, openaiCompatible }` ->
`new hermes.LLM()`, mirroring Python's `from getpatter.llm import hermes`. Frozen objects.
- session_key_factory / sessionKeyFactory + session_key_from="caller_hash": derive the
X-Hermes-Session-Key per call from a SHA-256 caller hash (new public SessionContext +
hash_caller/hashCaller), so an agent runtime remembers a caller across calls WITHOUT the
raw phone number ever reaching the wire or the logs. The factory takes precedence over
the static session_key; a falsy return omits the header. The loop dispatch was
generalised to thread caller/callee only to providers whose stream() declares them (or
**kwargs) — built-in and minimal custom providers unchanged. An unknown session_key_from
raises in both SDKs (parity).
- long_turn_message / longTurnMessage (+ _after_s, default 4 s): opt-in spoken filler when
a turn is slow and no audio has reached the caller yet — distinct from llm_error_message
(which fires on error). Fires once, gated on emitted audio; the TS timer is serialised
via an async clear() that awaits an in-flight filler so it can never overlap the real
sentence.
Adversarial review caught and fixed a TS filler double-speak race (the setTimeout callback
could overlap the first real sentence; Python's asyncio path was immune).
Python 2206 / TypeScript 1758 tests pass; tsc + build clean.
… per-turn cancel-event reset Live-call bug: in pipeline mode (Twilio + Deepgram STT + ElevenLabs TTS + an agent-LLM provider) the FIRST turn worked end-to-end but every subsequent turn went silent, leaving a ghost metrics turn of user_text='' agent_text='[interrupted]'. Two root causes in the pipeline turn-taking state machine, full Python/TypeScript parity: 1. Tail-grace misclassified the next turn as a barge-in. After the agent finishes TTS, _end_speaking_with_grace / endSpeakingWithGrace keeps _is_speaking=true for PATTER_TTS_TAIL_GRACE_MS (default 1500 ms) to swallow the fading echo tail. Humans reply in 200-700 ms — inside that window — so the user's next utterance was detected as a barge-in: it recorded an interrupted turn and the leading audio was withheld from STT (only a <=260 ms echo-contaminated ring), so no final transcript was produced and the agent never answered. New _tail_grace_active / tailGraceActive flag distinguishes "actively streaming TTS" from "post-TTS echo guard". A VAD speech_start OR a transcript during the tail grace now ends the grace and dispatches as a clean NEW turn via _end_tail_grace_for_new_turn / endTailGraceForNewTurn — recovering the leading audio from the ring instead of dropping it, with no spurious send_clear / record_turn_interrupted. Real barge-in during active TTS (tail_grace_active=false) is unchanged. The Python grace flip task is now tracked and cancelled (parity with TS clearGraceTimer) so at most one is in flight. 2. (Python) A barge-in's per-turn cancel event leaked into the next turn. _llm_cancel_event was recreated inside _process_streaming_response — AFTER LLMLoop.run had already captured the previous (still-set) event for the next turn — so the turn after any real barge-in bailed immediately. The reset moved to the top of _dispatch_turn, before dispatch; the event object is now stable through a turn (generator and consumption loop share it). TypeScript already allocates a fresh AbortController per turn in runPipelineLlm. Tests: new test_pipeline_multiturn_tail_grace.py (6) + pipeline-multiturn-tail-grace.mocked.test.ts (4) reproduce the bug and assert the rescue, the flag lifecycle, the active-TTS barge-in regression guard, and the fresh cancel-event. Python 2212 / TypeScript 1763 pass; tsc + build clean. Adversarial review: 0 critical / 0 high.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three opt-in developer-experience improvements for the agent-as-primary-LLM providers (shipped in 0.6.5), full Python/TypeScript parity. No core behaviour changes — every new field is optional and defaults to today's behaviour.
Implementation
src/index.ts):import { hermes, openclaw, openaiCompatible } from "getpatter"→new hermes.LLM(), mirroring Python'sfrom getpatter.llm import hermes; hermes.LLM(). Frozen objects; the namedHermesLLM/OpenClawLLM/OpenAICompatibleLLMexports remain.#7): new publicSessionContext+hash_caller/hashCallerhelper (SHA-256, 16 hex chars).OpenAICompatibleLLM/HermesLLMgainsession_key_factory/sessionKeyFactoryand the shortcutsession_key_from="caller_hash"→ emitsX-Hermes-Session-Key: patter-caller-<hash>, so a runtime remembers a caller across calls without the raw phone number ever reaching the wire or the logs. The factory takes precedence over the staticsession_key; a falsy return omits the header. The loop dispatch was generalised to threadcaller/calleeonly to providers whosestream()declares them (or**kwargs) — built-in and minimal custom providers are unchanged. An unknownsession_key_fromraises in both SDKs.#8):long_turn_message/longTurnMessage(+_after_s, default 4 s). When a turn is slow (agent runtime running tools) and no audio has reached the caller yet, Patter speaks a short configurable line instead of dead silence. Distinct fromllm_error_message(which fires on error). Fires once, gated on emitted audio; the TS timer is serialised via an asyncclear()that awaits an in-flight filler so it can never overlap the real reply.Breaking change?
No — all new fields optional with defaults that preserve current behaviour.
Test plan
pytest tests/— 2206 passed (+23:test_llm_session_key_factory.py,tests/unit/test_long_turn_filler.py)npm test(1758 passed) +npm run lint(tsc clean) +npm run buildsetTimeoutcallback could overlap the first real sentence — Python's asyncio path was immune), a Py/TS parity gap onsession_key_fromvalidation, and mutable namespace objects (nowObject.freezed).Docs updates
docs/integrations/hermes.mdx— caller-hash session-key scoping + the long-turn filler.Follow-ups (not in this PR)
init/doctor/run/call hermes— land on the init-wizard branch (the 36 KB wizard lives there).patter_calltool for Hermes — via thepatter-mcpserver.