Skip to content

feat(llm): agent runtimes as primary LLM — OpenAICompatibleLLM/HermesLLM/OpenClawLLM + spoken error fallback#157

Merged
nicolotognoni merged 2 commits into
mainfrom
feat/agent-llm-providers
Jun 5, 2026
Merged

feat(llm): agent runtimes as primary LLM — OpenAICompatibleLLM/HermesLLM/OpenClawLLM + spoken error fallback#157
nicolotognoni merged 2 commits into
mainfrom
feat/agent-llm-providers

Conversation

@nicolotognoni

Copy link
Copy Markdown
Collaborator

Summary

  • Patter becomes the voice shell in front of an OpenAI-compatible agent runtime: it owns the carrier + STT + turn-taking + TTS, while each conversation turn is answered by the agent at POST {base_url}/chat/completions. This is the "custom-LLM voice layer" model (Patter plays the role ElevenLabs Agents play in the ElevenLabs↔Hermes setup), but self-hosted — if Patter and the agent run on the same box, the agent gateway never needs to be tunnelled/exposed; only Patter faces the carrier.
  • Three new pipeline-mode LLM providers with full Python/TypeScript parity: a generic OpenAICompatibleLLM plus thin HermesLLM and OpenClawLLM presets.
  • Opt-in spoken fallback so a gateway-down / long-tool-call timeout speaks a configurable line instead of dead air.

Implementation

  • OpenAICompatibleLLM (libraries/python/getpatter/llm/openai_compatible.py, libraries/typescript/src/llm/openai-compatible.ts) — drives any OpenAI-compatible endpoint (Hermes, OpenClaw, Ollama, vLLM, LM Studio). Modelled 1:1 on the existing GroqLLM/CerebrasLLM presets. Adds a configurable long timeout (default 60s; the base provider sets none) and opt-in session continuity. Keyless local gateways supported via the conventional EMPTY sentinel; warmup() omits the Authorization header when no key is set.
  • HermesLLM — preset: 127.0.0.1:8642/v1, model hermes-agent (API_SERVER_MODEL_NAME fallback), key from API_SERVER_KEY, 120s timeout.
  • OpenClawLLM — preset: 127.0.0.1:18789/v1, model="openclaw/<agent>" pass-through, key from OPENCLAW_API_KEY, x-openclaw-session-key header. Reuses the agent-target/charset rules of the shipped consult target so the two paths can't drift.
  • Session continuity — opt-in user=patter-call-<call_id> (+ optional session header) so the runtime keys one session per call. call_id is threaded through the LLM loop additively (existing providers gain an optional param; no behaviour change).
  • agent.llm_error_message / llmErrorMessage — opt-in spoken fallback wired into the pipeline stream-handler's existing LLM-error branch, reusing the same TTS-speak primitive as first_message. Trigger is gated on emitted audio (first_tts_chunk/ttsFirstByteSent), not on tokens received — so a provider that streams partial tokens ("Let me check…") and then times out before a sentence boundary still triggers the line, while an already-spoken sentence never double-speaks. Default None/undefined preserves today's silence-on-error behaviour.
  • Roots re-export OpenAICompatibleLLM, OpenAICompatibleLLMProvider, HermesLLM, OpenClawLLM (Python and TypeScript symmetric).

Breaking change?

No. Every new field is optional with a default that preserves current behaviour (None/undefined). No existing constructor path requires a new key. The call_id threading on existing providers is additive.

Test plan

  • Python: pytest tests/ — 2168 passed (incl. new test_llm_openai_compatible.py, test_llm_hermes_openclaw_presets.py, test_llm_loop_call_id_threading.py, tests/unit/test_llm_error_fallback.py)
  • TypeScript: npm test (1729 passed) + npm run lint (tsc clean) + npm run build
  • New tests are authentic — they exercise real provider construction / header assembly / model routing and the real stream-handler error path, mocking only the HTTP and TTS byte boundaries
  • /parity-check (providers + new field audited in-PR by the parity reviewer; defaults byte-identical across SDKs)

Docs updates

  • docs/integrations/hermes.mdx — "Call your Hermes Agent over the phone using Patter" article (architecture, gateway setup, Python/TS examples, the localhost-not-ngrok security note, dead-air caveat).
  • docs/integrations/openclaw.mdxOpenClawLLM-as-primary section + generic OpenAI-compatible runtime note (Ollama/vLLM/LM Studio).

Follow-ups (not in this PR)

  • Wire hermes/openclaw/openai_compatible as --llm values into the getpatter init wizard + a doctor command — blueprint prepared; lands on the init-wizard branch where the wizard lives.
  • Feature-inventory row (tracked in the private assets repo).
  • Pre-existing low/medium parity-harness and cache_read_tokens naming items flagged by review — separate maintenance PR.

…LLM/OpenClawLLM + spoken error fallback

Let Patter act as the voice shell in front of an OpenAI-compatible agent
runtime: carrier + STT + turn-taking + TTS stay in Patter while each turn is
answered by POST {base_url}/chat/completions. Adds three pipeline-mode LLM
providers (full Python/TypeScript parity) plus an opt-in dead-air fallback.

- OpenAICompatibleLLM: generic provider for any OpenAI-compatible endpoint
  (Hermes, OpenClaw, Ollama, vLLM, LM Studio). Thin subclass of OpenAILLMProvider
  with a configurable long timeout (default 60s) and opt-in session continuity.
  Keyless local gateways supported via the conventional EMPTY sentinel; warmup
  omits the Authorization header when no key is set.
- HermesLLM: preset for the Hermes gateway (127.0.0.1:8642, model hermes-agent,
  API_SERVER_KEY, 120s timeout).
- OpenClawLLM: preset for the OpenClaw gateway (127.0.0.1:18789, openclaw/<agent>,
  OPENCLAW_API_KEY, x-openclaw-session-key), aligned with the shipped consult target.
- Session continuity: opt-in user=patter-call-<call_id> (+ optional session header)
  so the runtime keys one session per call. call_id threaded through the LLM loop
  additively (backward compatible).
- agent.llm_error_message / llmErrorMessage: opt-in spoken fallback when a turn's
  LLM stream raises (gateway down / timeout) before any audio reached the caller.
  Gated on emitted audio (not tokens) so a partial-token timeout still triggers it
  and an already-spoken sentence never double-speaks. Default None/undefined
  preserves today's silence-on-error behaviour.

Docs: hermes.mdx article ("Call your Hermes Agent over the phone using Patter")
+ openclaw.mdx section. Tests authentic — mock only the HTTP / TTS boundary.
… custom-provider call_id back-compat

Hermes is stateless and keys session continuity off request HEADERS, not the
OpenAI user field. HermesLLM now sends X-Hermes-Session-Id: patter-call-<call_id>
per call (primary mechanism) plus an optional static X-Hermes-Session-Key for
long-term memory scoping (new opt-in session_key / sessionKey param, default off).

- OpenAICompatibleLLM: decouple session-header emission from the user-field
  gating; split into session_id_header/session_id_prefix (per-call) and
  session_key_header/session_key (static memory scope). Each emitted
  independently; pre-existing extra_headers preserved. An empty-string session
  key is treated as unset (no empty header on the wire).
- OpenClaw: byte-identical on the wire (user=patter-call-<id> +
  x-openclaw-session-key=<id>) — its gateway derives the session from the user
  field, so its behaviour is intentionally unchanged.
- Backward compat: LLMLoop passes call_id only when the provider's stream()
  accepts it (inspect.signature guard, cached per provider type), so a custom
  provider with stream(messages, tools, *, cancel_event) no longer raises
  TypeError. TS already tolerates the extra options field; added a regression test.
- Docs: hermes.mdx session-continuity prose corrected to the header mechanism.

Python 2183 / TypeScript 1738 tests pass; tsc + build clean.
@nicolotognoni nicolotognoni merged commit 248b1f9 into main Jun 5, 2026
14 checks passed
@nicolotognoni

Copy link
Copy Markdown
Collaborator Author

Bumped to 0.6.5 in 36433c0 (all three version files + lockfile; CHANGELOG ## Unreleased rolled into ## 0.6.5 (2026-06-05)).

This PR is now the 0.6.5 release. After CI is green and you merge, tag v0.6.5 on main to trigger the PyPI + npm publish (per the release-via-PR process — the tag is pushed only after merge, never before).

@nicolotognoni nicolotognoni mentioned this pull request Jun 5, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant