feat(llm): agent runtimes as primary LLM — OpenAICompatibleLLM/HermesLLM/OpenClawLLM + spoken error fallback#157
Merged
Conversation
…LLM/OpenClawLLM + spoken error fallback
Let Patter act as the voice shell in front of an OpenAI-compatible agent
runtime: carrier + STT + turn-taking + TTS stay in Patter while each turn is
answered by POST {base_url}/chat/completions. Adds three pipeline-mode LLM
providers (full Python/TypeScript parity) plus an opt-in dead-air fallback.
- OpenAICompatibleLLM: generic provider for any OpenAI-compatible endpoint
(Hermes, OpenClaw, Ollama, vLLM, LM Studio). Thin subclass of OpenAILLMProvider
with a configurable long timeout (default 60s) and opt-in session continuity.
Keyless local gateways supported via the conventional EMPTY sentinel; warmup
omits the Authorization header when no key is set.
- HermesLLM: preset for the Hermes gateway (127.0.0.1:8642, model hermes-agent,
API_SERVER_KEY, 120s timeout).
- OpenClawLLM: preset for the OpenClaw gateway (127.0.0.1:18789, openclaw/<agent>,
OPENCLAW_API_KEY, x-openclaw-session-key), aligned with the shipped consult target.
- Session continuity: opt-in user=patter-call-<call_id> (+ optional session header)
so the runtime keys one session per call. call_id threaded through the LLM loop
additively (backward compatible).
- agent.llm_error_message / llmErrorMessage: opt-in spoken fallback when a turn's
LLM stream raises (gateway down / timeout) before any audio reached the caller.
Gated on emitted audio (not tokens) so a partial-token timeout still triggers it
and an already-spoken sentence never double-speaks. Default None/undefined
preserves today's silence-on-error behaviour.
Docs: hermes.mdx article ("Call your Hermes Agent over the phone using Patter")
+ openclaw.mdx section. Tests authentic — mock only the HTTP / TTS boundary.
… custom-provider call_id back-compat Hermes is stateless and keys session continuity off request HEADERS, not the OpenAI user field. HermesLLM now sends X-Hermes-Session-Id: patter-call-<call_id> per call (primary mechanism) plus an optional static X-Hermes-Session-Key for long-term memory scoping (new opt-in session_key / sessionKey param, default off). - OpenAICompatibleLLM: decouple session-header emission from the user-field gating; split into session_id_header/session_id_prefix (per-call) and session_key_header/session_key (static memory scope). Each emitted independently; pre-existing extra_headers preserved. An empty-string session key is treated as unset (no empty header on the wire). - OpenClaw: byte-identical on the wire (user=patter-call-<id> + x-openclaw-session-key=<id>) — its gateway derives the session from the user field, so its behaviour is intentionally unchanged. - Backward compat: LLMLoop passes call_id only when the provider's stream() accepts it (inspect.signature guard, cached per provider type), so a custom provider with stream(messages, tools, *, cancel_event) no longer raises TypeError. TS already tolerates the extra options field; added a regression test. - Docs: hermes.mdx session-continuity prose corrected to the header mechanism. Python 2183 / TypeScript 1738 tests pass; tsc + build clean.
Collaborator
Author
|
Bumped to 0.6.5 in This PR is now the 0.6.5 release. After CI is green and you merge, tag |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
POST {base_url}/chat/completions. This is the "custom-LLM voice layer" model (Patter plays the role ElevenLabs Agents play in the ElevenLabs↔Hermes setup), but self-hosted — if Patter and the agent run on the same box, the agent gateway never needs to be tunnelled/exposed; only Patter faces the carrier.OpenAICompatibleLLMplus thinHermesLLMandOpenClawLLMpresets.Implementation
OpenAICompatibleLLM(libraries/python/getpatter/llm/openai_compatible.py,libraries/typescript/src/llm/openai-compatible.ts) — drives any OpenAI-compatible endpoint (Hermes, OpenClaw, Ollama, vLLM, LM Studio). Modelled 1:1 on the existingGroqLLM/CerebrasLLMpresets. Adds a configurable long timeout (default 60s; the base provider sets none) and opt-in session continuity. Keyless local gateways supported via the conventionalEMPTYsentinel;warmup()omits theAuthorizationheader when no key is set.HermesLLM— preset:127.0.0.1:8642/v1, modelhermes-agent(API_SERVER_MODEL_NAMEfallback), key fromAPI_SERVER_KEY, 120s timeout.OpenClawLLM— preset:127.0.0.1:18789/v1,model="openclaw/<agent>"pass-through, key fromOPENCLAW_API_KEY,x-openclaw-session-keyheader. Reuses the agent-target/charset rules of the shipped consult target so the two paths can't drift.user=patter-call-<call_id>(+ optional session header) so the runtime keys one session per call.call_idis threaded through the LLM loop additively (existing providers gain an optional param; no behaviour change).agent.llm_error_message/llmErrorMessage— opt-in spoken fallback wired into the pipeline stream-handler's existing LLM-error branch, reusing the same TTS-speak primitive asfirst_message. Trigger is gated on emitted audio (first_tts_chunk/ttsFirstByteSent), not on tokens received — so a provider that streams partial tokens ("Let me check…") and then times out before a sentence boundary still triggers the line, while an already-spoken sentence never double-speaks. DefaultNone/undefinedpreserves today's silence-on-error behaviour.OpenAICompatibleLLM,OpenAICompatibleLLMProvider,HermesLLM,OpenClawLLM(Python and TypeScript symmetric).Breaking change?
No. Every new field is optional with a default that preserves current behaviour (
None/undefined). No existing constructor path requires a new key. Thecall_idthreading on existing providers is additive.Test plan
pytest tests/— 2168 passed (incl. newtest_llm_openai_compatible.py,test_llm_hermes_openclaw_presets.py,test_llm_loop_call_id_threading.py,tests/unit/test_llm_error_fallback.py)npm test(1729 passed) +npm run lint(tsc clean) +npm run buildDocs updates
docs/integrations/hermes.mdx— "Call your Hermes Agent over the phone using Patter" article (architecture, gateway setup, Python/TS examples, the localhost-not-ngrok security note, dead-air caveat).docs/integrations/openclaw.mdx—OpenClawLLM-as-primary section + generic OpenAI-compatible runtime note (Ollama/vLLM/LM Studio).Follow-ups (not in this PR)
hermes/openclaw/openai_compatibleas--llmvalues into thegetpatter initwizard + adoctorcommand — blueprint prepared; lands on the init-wizard branch where the wizard lives.cache_read_tokensnaming items flagged by review — separate maintenance PR.