Information-architecture pass on the 840-line README so the most important things come first and secondary/niche content is linked rather than inline: - Reorder: Why -> Quick start -> FEATURES (was at line 502, now right after Quick start) -> Configuration & access -> Docker -> Running tests -> Architecture -> Docs -> Contributors. Readers see what it does before the deployment minutiae. - Consolidate the scattered access sections (start.sh discovery, overrides, remote/SSH, Tailscale, manual launch) under one '## Configuration & access' H2 with H3 subsections. - Extract two genuinely-niche blocks to new linked docs (nothing deleted): - docs/advanced-chat-setup.md — dynamic recall-prefill + Gateway-backed chat - docs/remote-access.md — SSH tunnel + Tailscale + ARM64-Android field report Quick start keeps a one-line pointer to each. - Update Contents TOC + Docs index for the new order and new files. README 840 -> 705 lines; content preserved (verified moved-not-dropped); all internal links + new docs verified to resolve; docs/*.md gitignore-allowlisted.
4.0 KiB
Advanced chat setup
Two optional features for self-hosted Hermes WebUI deployments. Most users need neither — the defaults (in-process chat, no prefill) work out of the box.
Session recall prefill
WebUI can attach ephemeral prefill messages to new browser-originated agent turns. This is useful when a deployment already has a local recall or router script for Joplin, Obsidian, Notion, llm-wiki, or another third-party notes source and wants browser chat to know where durable context lives.
Prefer a compact router-style prefill (for example, "Joplin has the durable project context; use the available notes/search tools before answering detail-dependent questions") instead of dumping the full note corpus into every new browser session. The prefill should point the agent toward retrieval; the notes/search tools should provide the specific facts on demand.
Static JSON remains supported through prefill_messages_file or
HERMES_PREFILL_MESSAGES_FILE. For dynamic recall, opt in explicitly with a
WebUI-specific script hook:
webui_prefill_messages_script:
- python3
- /path/to/notes_recall.py
webui_prefill_messages_script_timeout: 5
or:
HERMES_WEBUI_PREFILL_MESSAGES_SCRIPT="python3 /path/to/notes_recall.py" \
HERMES_WEBUI_PREFILL_MESSAGES_SCRIPT_TIMEOUT=5 \
./ctl.sh restart
The script may print either an OpenAI-style JSON message list, a JSON object with
a messages list, or plain text; plain text is wrapped as one user prefill
message so dynamic recall text becomes ordinary context instead of an extra
system instruction. If the hook must provide system-level guidance, emit JSON
messages with an explicit role: "system" entry instead. Script output is capped
at 256 KiB before parsing. Parsed prefill context is then bounded by
webui_prefill_context_max_chars or HERMES_WEBUI_PREFILL_CONTEXT_MAX_CHARS
(default: 12,000 characters; set to 0 to disable). When a dynamic script
exceeds the budget and a compact static prefill file is configured, WebUI falls
back to that file. If no compact fallback is available, WebUI injects a short
retrieval instruction instead of sending the oversized note/body payload with
every new browser turn. The browser only receives a compact status event
(source, label, message count, compaction metadata, and redacted errors),
never the prefill message bodies.
Gateway-backed browser chat
By default, browser chat runs through WebUI's in-process legacy runtime. Advanced
self-hosted deployments can opt into routing new browser turns through a running
Hermes Gateway API server while preserving the existing WebUI /api/chat/start
and /api/chat/stream browser contract:
HERMES_WEBUI_CHAT_BACKEND=gateway \
HERMES_WEBUI_GATEWAY_BASE_URL=http://127.0.0.1:8642 \
HERMES_WEBUI_GATEWAY_API_KEY=... \
./ctl.sh restart
HERMES_WEBUI_CHAT_BACKEND is intentionally strict: only gateway,
api_server, or api-server enable the bridge. Generic truthy values such as
1 or true are ignored so existing deployments do not change execution
ownership accidentally. If HERMES_WEBUI_GATEWAY_API_KEY is omitted, WebUI falls
back to API_SERVER_KEY when present. When Gateway returns HTTP 401, WebUI
reports a gateway_auth_error that points at this WebUI↔Gateway key mismatch
rather than showing the Gateway's generic provider-style "Invalid API key" body.
/api/health/agent also includes a redacted gateway_chat block so operators can
see whether gateway mode, base URL, and API-key presence are configured without
exposing the key value. That gateway_chat field is an operator diagnostic
payload only; it is not currently rendered as a user-facing health banner in the
browser UI.
The bridge is best used by operators who already run Hermes Gateway/API Server locally and want browser-originated chat to use the same runtime/tool path as messaging surfaces. Attachments, cancellation, approvals, and clarify prompts still follow WebUI's current compatibility path and may not match every messaging surface until the runtime-adapter migration is complete.