Skip to content

feat(llm): Hermes DX — TS namespace exports, caller-hash session key, long-turn filler#159

Open
nicolotognoni wants to merge 2 commits into
mainfrom
feat/hermes-dx
Open

feat(llm): Hermes DX — TS namespace exports, caller-hash session key, long-turn filler#159
nicolotognoni wants to merge 2 commits into
mainfrom
feat/hermes-dx

Conversation

@nicolotognoni

Copy link
Copy Markdown
Collaborator

Summary

Three opt-in developer-experience improvements for the agent-as-primary-LLM providers (shipped in 0.6.5), full Python/TypeScript parity. No core behaviour changes — every new field is optional and defaults to today's behaviour.

Implementation

  • TypeScript namespace exports (src/index.ts): import { hermes, openclaw, openaiCompatible } from "getpatter"new hermes.LLM(), mirroring Python's from getpatter.llm import hermes; hermes.LLM(). Frozen objects; the named HermesLLM/OpenClawLLM/OpenAICompatibleLLM exports remain.
  • Session key from a caller hash (#7): new public SessionContext + hash_caller / hashCaller helper (SHA-256, 16 hex chars). OpenAICompatibleLLM/HermesLLM gain session_key_factory / sessionKeyFactory and the shortcut session_key_from="caller_hash" → emits X-Hermes-Session-Key: patter-caller-<hash>, so a runtime remembers a caller across calls without the raw phone number ever reaching the wire or the logs. The factory takes precedence over the static session_key; a falsy return omits the header. The loop dispatch was generalised to thread caller/callee only to providers whose stream() declares them (or **kwargs) — built-in and minimal custom providers are unchanged. An unknown session_key_from raises in both SDKs.
  • Long-turn filler (#8): long_turn_message / longTurnMessage (+ _after_s, default 4 s). When a turn is slow (agent runtime running tools) and no audio has reached the caller yet, Patter speaks a short configurable line instead of dead silence. Distinct from llm_error_message (which fires on error). Fires once, gated on emitted audio; the TS timer is serialised via an async clear() that awaits an in-flight filler so it can never overlap the real reply.

Breaking change?

No — all new fields optional with defaults that preserve current behaviour.

Test plan

  • Python: pytest tests/ — 2206 passed (+23: test_llm_session_key_factory.py, tests/unit/test_long_turn_filler.py)
  • TypeScript: npm test (1758 passed) + npm run lint (tsc clean) + npm run build
  • Adversarial review pass: caught + fixed a TS filler double-speak race (the setTimeout callback could overlap the first real sentence — Python's asyncio path was immune), a Py/TS parity gap on session_key_from validation, and mutable namespace objects (now Object.freezed).

Docs updates

  • docs/integrations/hermes.mdx — caller-hash session-key scoping + the long-turn filler.

Follow-ups (not in this PR)

  • CLI init / doctor / run / call hermes — land on the init-wizard branch (the 36 KB wizard lives there).
  • Outbound patter_call tool for Hermes — via the patter-mcp server.

… long-turn filler

Three opt-in developer-experience improvements for the agent-LLM providers, full
Python/TypeScript parity.

- TypeScript namespace exports: `import { hermes, openclaw, openaiCompatible }` ->
  `new hermes.LLM()`, mirroring Python's `from getpatter.llm import hermes`. Frozen objects.
- session_key_factory / sessionKeyFactory + session_key_from="caller_hash": derive the
  X-Hermes-Session-Key per call from a SHA-256 caller hash (new public SessionContext +
  hash_caller/hashCaller), so an agent runtime remembers a caller across calls WITHOUT the
  raw phone number ever reaching the wire or the logs. The factory takes precedence over
  the static session_key; a falsy return omits the header. The loop dispatch was
  generalised to thread caller/callee only to providers whose stream() declares them (or
  **kwargs) — built-in and minimal custom providers unchanged. An unknown session_key_from
  raises in both SDKs (parity).
- long_turn_message / longTurnMessage (+ _after_s, default 4 s): opt-in spoken filler when
  a turn is slow and no audio has reached the caller yet — distinct from llm_error_message
  (which fires on error). Fires once, gated on emitted audio; the TS timer is serialised
  via an async clear() that awaits an in-flight filler so it can never overlap the real
  sentence.

Adversarial review caught and fixed a TS filler double-speak race (the setTimeout callback
could overlap the first real sentence; Python's asyncio path was immune).

Python 2206 / TypeScript 1758 tests pass; tsc + build clean.
… per-turn cancel-event reset

Live-call bug: in pipeline mode (Twilio + Deepgram STT + ElevenLabs TTS + an
agent-LLM provider) the FIRST turn worked end-to-end but every subsequent turn
went silent, leaving a ghost metrics turn of user_text='' agent_text='[interrupted]'.

Two root causes in the pipeline turn-taking state machine, full Python/TypeScript parity:

1. Tail-grace misclassified the next turn as a barge-in.
   After the agent finishes TTS, _end_speaking_with_grace / endSpeakingWithGrace
   keeps _is_speaking=true for PATTER_TTS_TAIL_GRACE_MS (default 1500 ms) to
   swallow the fading echo tail. Humans reply in 200-700 ms — inside that window
   — so the user's next utterance was detected as a barge-in: it recorded an
   interrupted turn and the leading audio was withheld from STT (only a <=260 ms
   echo-contaminated ring), so no final transcript was produced and the agent
   never answered. New _tail_grace_active / tailGraceActive flag distinguishes
   "actively streaming TTS" from "post-TTS echo guard". A VAD speech_start OR a
   transcript during the tail grace now ends the grace and dispatches as a clean
   NEW turn via _end_tail_grace_for_new_turn / endTailGraceForNewTurn —
   recovering the leading audio from the ring instead of dropping it, with no
   spurious send_clear / record_turn_interrupted. Real barge-in during active
   TTS (tail_grace_active=false) is unchanged. The Python grace flip task is now
   tracked and cancelled (parity with TS clearGraceTimer) so at most one is in
   flight.

2. (Python) A barge-in's per-turn cancel event leaked into the next turn.
   _llm_cancel_event was recreated inside _process_streaming_response — AFTER
   LLMLoop.run had already captured the previous (still-set) event for the next
   turn — so the turn after any real barge-in bailed immediately. The reset
   moved to the top of _dispatch_turn, before dispatch; the event object is now
   stable through a turn (generator and consumption loop share it). TypeScript
   already allocates a fresh AbortController per turn in runPipelineLlm.

Tests: new test_pipeline_multiturn_tail_grace.py (6) +
pipeline-multiturn-tail-grace.mocked.test.ts (4) reproduce the bug and assert
the rescue, the flag lifecycle, the active-TTS barge-in regression guard, and
the fresh cancel-event. Python 2212 / TypeScript 1763 pass; tsc + build clean.
Adversarial review: 0 critical / 0 high.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant