From b2b38354db1312e55d5050335162c607dc1f5783 Mon Sep 17 00:00:00 2001 From: Frank Song Date: Thu, 14 May 2026 22:33:48 +0800 Subject: [PATCH 1/2] Update runtime adapter RFC gates --- CHANGELOG.md | 4 + docs/rfcs/README.md | 7 +- docs/rfcs/hermes-run-adapter-contract.md | 488 ++++++++++++----------- 3 files changed, 262 insertions(+), 237 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index b6df1a29..01e2e77d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,10 @@ - **PR #2236** by @jasonjcwu — Silent failure detection in `api/streaming.py` now scans only NEW messages, not the full conversation history. Pre-fix, the `_assistant_added` check at `_run_agent_streaming` scanned all messages in `result["messages"]` (including pre-turn history); if any prior turn contained an assistant response, `_assistant_added` was `True` and the apperror SSE event was silently skipped, leaving the user staring at a blank response after a provider 401/429/rate-limit error. Fix extracts a `_has_new_assistant_reply(all_messages, prev_count)` helper that only inspects messages beyond the pre-turn history offset (`_previous_context_messages`); applied to both the main detection path and the self-heal/retry `_heal_ok` check. 15-test regression suite covering empty/short/long-history scenarios, the heal path, and the `len < prev_count` edge-case fallback. Also includes a small alignment fix to `test_issue1857_usage_overwrite.py` so the FakeAgent message shape matches what the real agent produces. +### Docs + +- **PR #2251** by @franksong2702 (refs #1925) — Updates the Hermes run adapter RFC to codify the #1925 review direction: WebUI stays broad in product scope but becomes thin in execution ownership. The revised RFC credits Michael Lam's "protocol translator, not runtime surrogate" guardrail, defines the browser event/control contract, classifies current runtime state into runner/journal/adapter/presentation ownership, adds an acceptance-test catalog, and gates the first implementation slice to append-only journal/replay without changing `_run_agent_streaming` control flow. + ## [v0.51.60] — 2026-05-14 — Release AJ (stage-353 — 3-PR overlapping Appearance + critical #2223 compression-rotation data-loss fix + Opus SHOULD-FIX on parent_session_id) ### Fixed diff --git a/docs/rfcs/README.md b/docs/rfcs/README.md index ae50b3ce..78fed41c 100644 --- a/docs/rfcs/README.md +++ b/docs/rfcs/README.md @@ -38,8 +38,9 @@ First-time contributor RFCs should be discussed in an issue before opening a PR. ## Current RFCs -- [`hermes-run-adapter-contract.md`](hermes-run-adapter-contract.md) — Event/control - compatibility contract and gap matrix for moving WebUI chat runs to Hermes-owned - runtime execution. +- [`hermes-run-adapter-contract.md`](hermes-run-adapter-contract.md) — #1925 + event/control contract, runtime-state ownership matrix, acceptance catalog, + and reversible migration gates for moving WebUI execution behind an explicit + adapter boundary. - [`turn-journal.md`](turn-journal.md) — Crash-safe WebUI turn journal for recovering interrupted chat submissions. diff --git a/docs/rfcs/hermes-run-adapter-contract.md b/docs/rfcs/hermes-run-adapter-contract.md index d6ff2b23..206b066b 100644 --- a/docs/rfcs/hermes-run-adapter-contract.md +++ b/docs/rfcs/hermes-run-adapter-contract.md @@ -1,134 +1,140 @@ -# Hermes Run Adapter Compatibility Contract +# Hermes Run Adapter Contract and Migration Gates - **Status:** Proposed - **Author:** @Michaelyklam +- **Updated by:** @franksong2702 - **Created:** 2026-05-11 +- **Revised:** 2026-05-14 - **Tracking issue:** [#1925](https://github.com/nesquena/hermes-webui/issues/1925) -## Problem +## Credit and Scope -Hermes WebUI currently gives a rich workbench experience, but browser-originated -chat turns are still executed inside the WebUI server process. The WebUI path -creates process-local stream state, starts background agent threads, constructs or -reuses `AIAgent`, and owns callback queues for token, tool, reasoning, approval, -and clarify state. +This RFC codifies the direction discussed in #1925. It does not introduce an +implementation. The central guardrail comes from Michael Lam's review framing: -The target boundary from #1925 is: +> the adapter should be a protocol translator, not a runtime surrogate. + +The product boundary from #1925 is: > WebUI should be thin in execution ownership, not thin in product scope. That means WebUI remains the full browser workbench for sessions, workspace -files, chat rendering, tools, approvals, status, diagnostics, and controls. The -change is that Hermes Agent must own run lifecycle, event ordering, replay, -approvals, clarify, cancellation, and terminal state. +files, chat rendering, tool cards, approvals, status, diagnostics, and controls. +The change is that long-lived execution ownership should move behind an explicit +runtime boundary instead of remaining scattered through the main WebUI request +process. -This document defines the first reviewable contract for a Hermes-owned run -adapter. It is intentionally a spec/gap matrix, not an implementation plan for a -new WebUI runtime surrogate. +This document is intentionally a reviewable spec and migration gate. It should be +accepted before any implementation PR changes the streaming hot path, introduces a +runner process, or moves cancellation / approval / clarify control flow. + +## Problem + +Browser-originated chat turns are still executed inside the WebUI server process. +The current path creates process-local stream state, starts background agent +threads, constructs or reuses `AIAgent`, and owns callback state for token, tool, +reasoning, approval, clarify, cancellation, and terminal events. + +That shape works, but it makes the WebUI process the owner of active runtime +truth. Consequences include: + +- restarting WebUI can orphan active work, +- reconnect depends on process-local state rather than a durable run/event view, +- cancellation and stale writeback bugs recur around ownership boundaries, +- approvals and clarify prompts are tied to live callbacks, +- future Hermes runtime APIs cannot be adopted cleanly because WebUI lacks a + single adapter boundary. + +The immediate goal is not to build a sidecar. The immediate goal is to define the +browser contract, classify current runtime state, and gate the first reversible +journal slice. ## Goals -- Keep the browser-facing WebUI workbench contract stable while execution moves - out of the WebUI process. -- Define the minimum Hermes Runtime API / IPC v0 surface WebUI needs before it - can route new runs to Hermes-owned execution. -- Map current WebUI-owned runtime primitives to Hermes-owned APIs, WebUI - presentation state, or explicit temporary compatibility shims. -- Make restart/reattach the first meaningful success criterion, not merely - "basic chat streamed once." +- Preserve the current rich WebUI workbench experience. +- Make the browser-facing event/control contract explicit. +- Classify every current runtime-owned state primitive as `runner process`, + `journal`, `adapter API surface`, or `WebUI presentation cache`. +- Identify future backend mapping: existing Hermes runtime API, missing Hermes + API, or temporary WebUI compatibility shim. +- Define acceptance tests that must survive any migration. +- Define reversible implementation slices, starting with an append-only + in-process event journal / replay layer. ## Non-goals - Do not implement the adapter in this RFC. -- Do not create a new run-manager sidecar or broker requirement. -- Do not re-create `STREAMS`, cached `AIAgent` objects, approval queues, clarify - queues, or cancellation flags under new names inside WebUI. -- Do not reduce WebUI product scope. The rich workbench UX remains in WebUI. -- Do not require every event to be durably persisted on day one if the first - upstream runtime slice can still prove Hermes-owned execution and reconnect. +- Do not introduce a runner process or sidecar in the first implementation slice. +- Do not change `_run_agent_streaming` control flow in the first journal slice. +- Do not recreate `STREAMS`, cached `AIAgent` objects, callback queues, or + cancellation flags under new names. +- Do not reduce WebUI product scope or move normal workbench UX out of WebUI. +- Do not depend on Hermes Agent shipping a WebUI-specific runtime connector before + WebUI can improve its own boundary. -## Ownership boundary +## Artifact 1: Browser Event and Control Contract -### Hermes Agent owns +This is the compatibility contract the browser depends on, regardless of whether +the backend is today's in-process streaming path, an in-process journaled path, a +future WebUI-managed runner, or a future Hermes `/v1/runs` backend. -- run creation and lifecycle -- run ids and session-to-active-run mapping -- ordered event stream and replay cursor -- terminal run state, final result, and error metadata -- model/provider/profile/toolset routing -- agent execution and tool dispatch -- command semantics and capability metadata -- approval and clarify lifecycle -- cancel, interrupt, queue, continue, steer, and goal control where supported -- durable runtime/session state needed for reconnect +The current inventory should be derived from `static/messages.js` consumers and +SSE/event production in `api/streaming.py`. Future edits to those files should +update this RFC or the implementation contract that replaces it. -### WebUI owns +### Event Envelope -- browser authentication and presentation-specific session routing -- chat layout, transcript rendering, tool cards, thinking/progress display -- approval and clarify widgets -- workspace/file-panel UX -- settings/admin/diagnostics presentation -- adapting Hermes runtime events into WebUI-compatible browser events -- temporary compatibility shims explicitly listed in this RFC - -## WebUI event/control compatibility contract - -The browser-facing contract should remain stable enough that the current WebUI -workbench can render either the legacy in-process runtime or the Hermes-owned run -adapter during migration. These are presentation events over Hermes runtime -truth, not a second source of truth. - -All events should include enough metadata for idempotent rendering and -reconnect: +Every replayable runtime event should be representable with: ```json { "event_id": "run_123:42", "seq": 42, "run_id": "run_123", - "session_id": "20260511_...", - "type": "tool.update", - "created_at": 1778540000.0, + "session_id": "20260514_...", + "type": "tool.updated", + "created_at": 1778750000.0, "terminal": false, "payload": {} } ``` -`event_id` may be an SSE `id:` value or an equivalent cursor token. `seq` is a -monotonic per-run cursor. Clients may send `Last-Event-ID` or `after_seq` on -reconnect. The runtime should treat replay as at-least-once delivery; WebUI must -deduplicate by `run_id` + `seq` / `event_id`. +Required semantics: -### Event families +- `seq` is monotonic per run. +- `event_id` is stable enough to use as an SSE `id:` value or equivalent cursor. +- Reconnect supports `Last-Event-ID` or `after_seq`. +- Replay is at-least-once; WebUI deduplicates by `run_id` + `seq` or `event_id`. +- Terminal runs can replay their final `done`, `cancelled`, or `error` state. -| WebUI event family | Required payload | Runtime source of truth | -|---|---|---| -| `run.started` / `status` | lifecycle state, controls available, session id, workspace/profile/model/toolset summary | Hermes run state | -| `token.delta` | assistant message id/segment id, delta text, optional content type | Hermes model output stream | -| `reasoning.delta` / `reasoning.done` | reasoning text or structured reasoning block, visibility metadata | Hermes reasoning callback/event stream | -| `progress` | concise status/progress text, optional phase/tool context | Hermes agent progress callbacks | -| `tool.started` | tool call id, tool name, sanitized arguments, start time | Hermes tool dispatch lifecycle | -| `tool.updated` | stdout/stderr/structured partial data, progress metadata | Hermes tool dispatch lifecycle | -| `tool.done` | result, exit/status, duration, error flag | Hermes tool dispatch lifecycle | -| `approval.requested` | approval id, command/action summary, risk metadata, available choices | Hermes approval queue/control plane | -| `approval.resolved` | approval id, choice, resulting status | Hermes approval queue/control plane | -| `clarify.requested` | clarify id, question, choices/input mode | Hermes clarify lifecycle | -| `clarify.resolved` | clarify id, answer metadata/status | Hermes clarify lifecycle | -| `title.updated` | title text, title source/confidence | Hermes session/title subsystem | -| `usage.updated` / `usage.final` | tokens, cost, model/provider, duration where available | Hermes usage accounting | -| `error` | stable error code, safe message, redacted diagnostic metadata, terminal flag | Hermes run terminal/error state | -| `done` | final lifecycle state, usage, terminal result/error summary, last seq | Hermes run terminal state | +### Event Families -### Reconnect metadata +| Event family | Required payload | Browser responsibility | Runtime source of truth | +|---|---|---|---| +| `run.started` / `status` | lifecycle state, controls available, session id, workspace/profile/model/toolset summary | render active state and controls | runtime run state | +| `token.delta` | assistant message id or segment id, delta text, content type | append visible assistant text | runtime model output stream | +| `reasoning.delta` / `reasoning.done` | reasoning block id, delta/final text, visibility metadata | render thinking/progress UI | runtime reasoning events | +| `progress` | concise phase/status text, optional tool context | render activity/progress text | runtime progress callbacks | +| `tool.started` | tool call id, name, sanitized arguments, start time | open/update tool card | runtime tool lifecycle | +| `tool.updated` | stdout/stderr/structured partial data, progress metadata | update tool card | runtime tool lifecycle | +| `tool.done` | result, status/exit code, duration, error flag | finalize tool card | runtime tool lifecycle | +| `approval.requested` | approval id, action summary, risk metadata, available choices | show approval widget | runtime approval state | +| `approval.resolved` | approval id, choice, resulting status | close/update approval widget | runtime approval state | +| `clarify.requested` | clarify id, question, choices/input mode | show clarify widget | runtime clarify state | +| `clarify.resolved` | clarify id, answer metadata/status | close/update clarify widget | runtime clarify state | +| `title.updated` | title text, source/confidence | update title surfaces | session/title subsystem | +| `usage.updated` / `usage.final` | tokens, cost, model/provider, duration where available | update usage surfaces | runtime usage accounting | +| `error` | stable error code, safe message, redacted diagnostics, terminal flag | render error and final state | runtime terminal/error state | +| `done` | final lifecycle state, usage, terminal result/error summary, last seq | finalize run UI | runtime terminal state | + +### Reconnect Metadata Every active or terminal run must expose: - `run_id` - `session_id` -- current `status`: `queued`, `running`, `awaiting_approval`, - `awaiting_clarify`, `paused`, `cancelling`, `cancelled`, `failed`, - `completed`, or `expired` +- `status`: `queued`, `running`, `awaiting_approval`, `awaiting_clarify`, + `paused`, `cancelling`, `cancelled`, `failed`, `completed`, or `expired` - last committed event cursor / `last_event_id` - terminal state and final result/error when finished - currently available controls @@ -137,179 +143,193 @@ Every active or terminal run must expose: ### Controls -| WebUI control | Required semantics | Runtime endpoint / IPC | +| Control | Required semantics | Target owner | |---|---|---| -| cancel | Request graceful cancellation of the current run; terminal event must follow | `cancel_run` / `interrupt` | -| queue / continue | Append follow-up work to a live, paused, or resumable run/session according to Hermes semantics | `queue_or_continue` | -| approval | Resolve a pending approval request with `allow_once`, `allow_session`, `always`, or `deny` where supported | `respond_approval` | -| clarify | Submit answer text or selected choice for a pending clarify request | `respond_clarify` | -| goal | Set/status/pause/resume/clear goal where Hermes exposes goal capability for this surface | command/capability API | -| observe | Attach to live events and replay from cursor | `observe_run` | -| status | Poll lifecycle state when SSE/WebSocket is unavailable | `get_run` | +| observe | attach to live events and replay from cursor | adapter API surface backed by runtime/journal | +| status | poll lifecycle state when SSE/WebSocket is unavailable | adapter API surface backed by runtime/journal | +| cancel | request graceful cancellation; terminal event follows | runner/runtime control plane | +| queue / continue | append follow-up work according to Hermes semantics | runner/runtime control plane | +| approval | resolve pending approval by id with supported choices | runner/runtime control plane | +| clarify | answer pending clarify request by id | runner/runtime control plane | +| goal | set/status/pause/resume/clear goal where capability exists | runtime command/capability plane | -WebUI may keep local UI state such as which disclosure rows are expanded, but it -must not infer or privately mutate runtime state for these controls. +WebUI may keep presentation state such as expanded rows, selected tabs, and local +scroll position. WebUI must not privately mutate runtime truth for these controls. -## Hermes Runtime API / IPC v0 minimum +## Artifact 2: Runtime State Inventory and Classifier -The transport can be HTTP, stdio IPC, websocket, or another Hermes-owned local -protocol. The key requirement is the semantic contract: Hermes owns the run id, -lifecycle, event cursor, controls, pending human-interaction state, and terminal -state. +Classifications: -### `start_run` +- `runner process`: should be owned by the eventual execution runner / runtime + backend, not the main WebUI request process. +- `journal`: should be captured in append-only durable events for replay and + diagnostics. +- `adapter API surface`: should be exposed through a WebUI-owned boundary that + can later switch backend implementations. +- `WebUI presentation cache`: may remain local because it is not execution truth. -Creates a Hermes-owned run. +| Current primitive | Current legacy source of truth | Target classification | Future backend mapping | Slice 1 handling | Notes / gap | +|---|---|---|---|---|---| +| `STREAMS` / `STREAMS_LOCK` | `api.state_sync` process memory | adapter API surface + presentation fan-out | WebUI runner or future Hermes run observation API | keep live path; mirror events into journal | Must stop being authoritative for active run existence. | +| `CANCEL_FLAGS` | `api.state_sync` process memory | runner process | cancel/interrupt endpoint or runner control | no control-flow change | Final cancel state must return as a replayable event. | +| cached `AIAgent` objects / `AGENT_INSTANCES` | `api/config.py` process memory | runner process | runner-owned Hermes integration | unchanged | Moving this is deferred until after journal proof. | +| background thread lifecycle | `_run_agent_streaming` in `api/streaming.py` | runner process | runner-owned execution lifecycle | unchanged | Slice 1 must not rewrite thread/control flow. | +| token / partial text buffers | streaming callbacks and browser SSE state | journal + presentation cache | replayable runtime events | append emitted events | Browser can cache rendered state, but replay must rebuild it. | +| reasoning buffers | streaming callbacks and UI rendering state | journal + presentation cache | replayable reasoning events | append emitted events | Thinking cards must survive reconnect. | +| tool buffers / live tool calls | WebUI streaming callbacks | journal + presentation cache | replayable tool lifecycle events | append emitted events | WebUI owns rendering, not tool execution state. | +| approval callbacks / queues | live Python callbacks | runner process + adapter API surface + journal | approval state/control endpoint | journal request/resolution events only | Pending approval must eventually survive WebUI restart. | +| clarify callbacks / queues | live Python callbacks | runner process + adapter API surface + journal | clarify state/control endpoint | journal request/resolution events only | Pending clarify must eventually survive WebUI restart. | +| per-request `HERMES_HOME` env mutation lock | `api/streaming.py` / config helpers | runner process | runner/profile execution context | unchanged | Long-term runner must isolate profile env without process-global mutation. | +| session-to-active-run mapping | session JSON + active stream ids + memory | journal + adapter API surface | runtime run registry/session mapping | journal run metadata | Reopen session must discover active/completed run. | +| title generation state | WebUI callbacks/session saves | journal + presentation cache | runtime/session title event | append title events | WebUI may display title updates after event receipt. | +| usage accounting state | WebUI callbacks/session saves | journal + presentation cache | runtime usage event/source of truth | append usage events | Avoid divergent WebUI-only accounting. | +| command capability metadata | WebUI command registry + Hermes command assumptions | adapter API surface | runtime command/capability metadata | unchanged | Unknown command support should not be guessed by WebUI. | +| voice mode state | browser/UI + streaming path | presentation cache + adapter API surface | runtime input/control capability | unchanged | Acceptance tests must pin voice behavior before migration. | +| project/workspace context | WebUI session/workspace state + env mutation | adapter API surface + runner process | runtime run context | unchanged | Must preserve workspace-aware chat and project context. | -Input fields: +Unclassified state is a design blocker. If an implementation slice discovers a +runtime primitive that does not fit this table, update the RFC before landing code. -- `session_id` or instruction to create one -- user message / queued input -- workspace context and attachments metadata -- profile/provider/model/toolset hints -- source/surface metadata, e.g. `source=webui` -- optional command intent, e.g. `/goal` if parsed by WebUI command UI -- idempotency key for duplicate browser submissions +## Artifact 3: Acceptance Test Catalog -Output fields: +These are the user-observable behaviors that must survive the migration. The +catalog should become automated tests where practical. Where full automation is +not feasible in the first slice, the PR must include the strongest practical +diagnostic or manual validation plan. -- `run_id` -- `session_id` -- initial `status` -- `observe` cursor / first event id -- supported controls for this run +| Behavior | Acceptance criterion | Why it matters | First slice that must prove it | +|---|---|---|---| +| Restart/reconnect mid-stream | start a run, restart only WebUI, reload browser, replay/catch up from cursor, final state matches | proves active work no longer depends only on WebUI process memory | journal/replay slice | +| Terminal replay | completed/failed/cancelled runs replay terminal state and do not duplicate transcript content | prevents stale spinner and duplicate-message regressions | journal/replay slice | +| Cancel during tool call | cancel emits one terminal cancelled state and no stale writeback | catches historical stream ownership races | control migration slice | +| Cancel during reasoning | partial/reasoning content is preserved cleanly and final state is not provider-error | catches cancellation classification regressions | control migration slice | +| Approval request/response | approval survives observation, browser response reaches runtime, result is replayable | approval callbacks are cross-cutting and easy to orphan | approval migration slice | +| Clarify request/response | clarify survives observation, browser response reaches runtime, result is replayable | same risk as approval, different UI/control path | clarify migration slice | +| Slash commands | `/compress`, `/branch`, `/retry`, and other supported commands keep current semantics | command behavior should not be reimplemented ad hoc | command capability slice | +| Model switch mid-session | provider/model changes route through the correct runtime context | prevents provider/source-of-truth drift | adapter control slice | +| Workspace context | run receives the session workspace and attachments context | preserves workbench value | adapter control slice | +| Multi-profile isolation | profile-specific runs write/read the correct Hermes home and memory | protects #2134-family isolation concerns | runner/profile slice | +| Queue/continue | follow-up input during live/resumable work obeys Hermes semantics | prevents parallel continuation model | control migration slice | +| Goal continuation | goal status/control survives the adapter boundary | goal logic is lifecycle-sensitive | goal capability slice | +| Voice mode | voice-originated input uses the same run/event/control contract | prevents alternate input path drift | adapter parity slice | +| Projects context | project metadata remains visible and correct across run replay | preserves session/workbench organization | adapter parity slice | -### `observe_run` +## Artifact 4: Slicing Plan and Reversibility -Streams ordered run events, with replay from a cursor. +### Slice 0: Spec PR -Required behavior: +Scope: -- support `after_seq` or `Last-Event-ID` -- emit events in monotonically increasing per-run order -- replay terminal `error` / `done` state for completed runs -- make duplicate delivery safe for reconnecting clients -- preserve enough history for short WebUI restarts and browser reloads +- this RFC update, +- no runtime behavior change, +- no streaming hot-path code change. -### `get_run` +Revert path: revert the docs PR. -Returns current lifecycle state without consuming the event stream. +### Slice 1: Append-only journal/replay beside the legacy path -Required fields: +Pre-authorized only after this spec is reviewed and accepted in #1925. -- `run_id`, `session_id`, `status` -- `created_at`, `updated_at`, optional `completed_at` -- `last_seq` / `last_event_id` -- active controls -- pending approval/clarify summaries -- terminal result/error summary -- usage/model/provider/profile/toolset summary where available +Scope: -### `cancel_run` / interrupt +- add an append-only event journal alongside existing callback paths, +- capture the event families in Artifact 1, +- persist run metadata, cursor, terminal state, and safe diagnostic fields, +- allow reconnect to replay from a cursor and then continue live observation, +- keep `_run_agent_streaming` control flow unchanged, +- keep cancellation, approval, clarify, queue, and goal behavior unchanged. -Requests graceful run cancellation or interruption. Hermes owns the final state -transition and emits a terminal event. WebUI should not directly toggle a local -cancellation flag as the source of truth. +Non-goals: -### `queue_or_continue` +- no runner process, +- no sidecar, +- no adapter interface that changes control flow, +- no replacement of `STREAMS` as the live delivery path, +- no speculative rewrite of agent construction/caching. -Submits follow-up work for a live, paused, or resumable run/session. Semantics -must match Hermes-native queue/continue behavior so WebUI does not create a -parallel continuation model. +Revert path: -### `respond_approval` +- disable journal writes/replay behind one small integration seam, +- retain legacy WebUI streaming path unchanged. -Resolves a pending approval request by id. +Success criterion: -Required behavior: - -- validate the approval belongs to the run/session -- accept only supported choices -- emit `approval.resolved` -- continue, pause, or fail the run according to Hermes approval semantics - -### `respond_clarify` - -Resolves a pending clarification request by id. - -Required behavior: - -- validate the clarify request belongs to the run/session -- accept text or selected-choice payloads -- emit `clarify.resolved` -- continue or fail the run according to Hermes clarify semantics - -## Gap matrix - -| Current WebUI primitive | Current role | Hermes-owned target | Temporary shim allowed? | Notes / gap | -|---|---|---|---|---| -| `STREAMS` / `STREAMS_LOCK` | Process-local live stream registry and subscriber fan-out | Hermes run registry + `observe_run` replay/fan-out | Yes, adapter may keep per-browser SSE connections only | Shim must not be the run source of truth and must survive WebUI restart by re-observing Hermes. | -| `CANCEL_FLAGS` | Local cancellation signal checked by WebUI-owned agent thread | `cancel_run` / interrupt control | No, except translating button clicks into runtime calls | Cancellation result must come back as Hermes status/events. | -| `AGENT_INSTANCES` | Cached `AIAgent` objects inside WebUI process | Hermes Agent runtime owns agent construction/reuse | No | Keeping this in the adapter would recreate the runtime surrogate. | -| Partial text buffers | Reconstruct live assistant deltas for browser reconnect/render | Hermes event log/cursor plus WebUI renderer cache | Short-lived presentation cache only | Source should be replayed token events or persisted transcript, not WebUI-only execution state. | -| Reasoning buffers | Preserve streamed reasoning/thinking text | Hermes reasoning events + replay | Short-lived presentation cache only | Replay must rebuild the same thinking cards after refresh. | -| Tool buffers / live tool calls | Render tool cards and updates | Hermes tool lifecycle events + replay | Short-lived presentation cache only | WebUI owns card rendering, not tool execution state. | -| Approval callbacks and queues | Bridge WebUI buttons to a live Python callback | Hermes pending approval state + `respond_approval` | No private callback queue | Pending approval must be discoverable after WebUI restart. | -| Clarify callbacks and queues | Bridge WebUI form to a live Python callback | Hermes pending clarify state + `respond_clarify` | No private callback queue | Pending clarify must be discoverable after WebUI restart. | -| Command capability metadata | Decide which slash commands render/execute in WebUI | Hermes command registry/capability API with owner/surface metadata | WebUI may cache metadata | Unknown commands should not be reimplemented in WebUI by default. | -| Session-to-active-run mapping | Stored implicitly in WebUI session JSON / active stream ids | Hermes session/run mapping API | WebUI may cache last seen run id | Reopen session must rediscover active/completed run from Hermes. | -| Reconnect/replay behavior | Depends on WebUI process memory and session JSON | `observe_run(after_seq)` + `get_run` terminal state | Browser SSE adapter only | First milestone must prove WebUI restart does not orphan the run. | -| Usage/title/status events | Produced by WebUI streaming callbacks | Hermes usage/title/status events and run state | WebUI formatting only | WebUI can display and persist presentation copies after events arrive. | -| Goal / queue / continue hooks | Mixed WebUI command handling and streaming callbacks | Hermes command/control plane | Only UI affordance shim | Goal support should be driven by Hermes capabilities. | - -## Migration ladder - -1. **Inventory and contract**: keep this RFC current with the current WebUI-owned - runtime primitives and browser event/control contract. -2. **Hermes Runtime API / IPC v0**: add or stabilize upstream Hermes primitives - for `start_run`, `observe_run`, `get_run`, `cancel_run`, and replayable event - cursors. -3. **Read-only observation spike**: from WebUI, observe an existing Hermes-owned - run and adapt its events into WebUI-compatible event objects without starting - a WebUI-owned agent thread. -4. **Feature-flagged new-run path**: route new WebUI runs to Hermes-owned - `start_run` behind a flag while preserving the legacy path as fallback. -5. **Restart/reattach milestone**: prove a non-trivial WebUI-started run - survives a WebUI-only restart and browser reload with ordered replay. -6. **Controls migration**: move cancel, queue/continue, approval, clarify, and - goal controls to Hermes-owned endpoints/capabilities. -7. **Parity tests**: compare legacy and adapter event streams for synthetic - token, reasoning, tool, approval, clarify, error, and done scenarios. -8. **Retire runtime surrogate state**: remove normal WebUI chat ownership of - `AIAgent`, cancellation flags, callback queues, and process-local run truth - once parity and fallback criteria are satisfied. - -## First success criterion - -The first implementation milestone is not "basic chat streams through a new -endpoint." The first meaningful milestone is: - -1. Start a non-trivial chat run from WebUI through the Hermes-owned path. -2. Restart only `hermes-webui` while the run is active. -3. Reload or reopen the browser session. -4. Rediscover the same `run_id` from Hermes using `session_id` or last known run - metadata. -5. Replay events from the last cursor with no duplicate visible transcript - content. -6. Render the same token/reasoning/tool/approval/clarify state the workbench - would have rendered without the restart. -7. Cancel the run from WebUI and observe Hermes emit the terminal cancelled +1. Start a non-trivial WebUI run. +2. Restart only `hermes-webui` while the run is active or shortly after terminal state. +3. Reload the browser/session. +4. Rediscover the run from journal metadata. +5. Replay from cursor without duplicate visible transcript content. +6. Render the same token/reasoning/tool/status/terminal state the workbench would + have rendered without the restart. -If this works, WebUI is moving toward a protocol translator over Hermes-owned -execution instead of becoming another runtime with different variable names. +### Slice 2: Adapter interface over the journaled legacy path -## Open questions +Scope: -- Where should the normative Hermes Runtime API / IPC v0 spec live: in - `NousResearch/hermes-agent`, this WebUI RFC, or both with one designated - source of truth? -- What retention window is enough for v0 event replay: active-run memory only, - SQLite-backed event log, or transcript-derived reconstruction plus terminal - state? -- Should WebUI talk to Hermes over the existing API server, an embedded IPC - channel, or a profile-local runtime socket? -- How should multiple clients observing the same run coordinate controls and - pending approval/clarify prompts? -- Which slash commands need surface-specific capability metadata before WebUI - can safely delegate them to Hermes? +- introduce the `RuntimeAdapter` interface only after Slice 1 proves replay, +- implement the first backend as a thin facade over the still-legacy path plus + journal, +- keep the browser event contract stable, +- keep controls routed to existing code until a later control-specific slice. + +Revert path: switch the feature flag back to direct legacy path. + +### Slice 3: Control migration + +Scope: + +- move cancel first, +- then approval, +- then clarify, +- then queue/continue and goal controls, +- each control gets its own acceptance tests and rollback path. + +Revert path: per-control feature flags or route-level fallback to legacy control +handlers. + +### Slice 4: Runner process / sidecar boundary + +Explicitly deferred until Slice 1 has worked in production for at least one +release cycle and the adapter surface has review approval. + +Scope: + +- move long-lived execution out of the main WebUI request process, +- runner owns active execution state, +- main WebUI server observes/replays through the adapter/journal, +- future Hermes CLI/Python/local API or `/v1/runs` backends can be evaluated + behind the adapter. + +Revert path: disable runner backend and fall back to journaled legacy backend. + +## First Meaningful Success Criterion + +The first meaningful milestone is not "basic chat streams through a new module." +It is: + +1. Start a long-running run from WebUI. +2. Restart only `hermes-webui`. +3. Keep the active run observable through durable journal state. +4. Reload the browser/session. +5. Replay/catch up from cursor. +6. Preserve the rendered workbench state without duplicate transcript content. +7. If the run is still active, cancellation still works through the existing + control path until the control migration slice replaces it. + +If this works without moving runtime ownership into a new pile of process-local +globals, the architecture is moving in the right direction. + +## Open Questions + +- What exact storage format should Slice 1 use: SQLite run/event tables, JSONL, + or a hybrid with transcript-derived checkpoints? +- How long should event replay be retained after terminal state? +- Which event fields must be redacted before journal persistence? +- Should the journal live under the WebUI state dir, the session dir, or a + future runtime-specific subdirectory? +- What is the minimum set of synthetic event fixtures needed to compare legacy + rendering with replay rendering? +- Which controls need route-level feature flags before migration? +- If Hermes Agent later ships a durable `/v1/runs` API, which adapter fields map + directly and which remain WebUI presentation concerns? From 5ba5551d05140650ef6da38fb06c1dfa3ad377ef Mon Sep 17 00:00:00 2001 From: Frank Song Date: Thu, 14 May 2026 22:42:15 +0800 Subject: [PATCH 2/2] Clarify runtime adapter replay gates --- docs/rfcs/hermes-run-adapter-contract.md | 54 +++++++++++++++++------- 1 file changed, 39 insertions(+), 15 deletions(-) diff --git a/docs/rfcs/hermes-run-adapter-contract.md b/docs/rfcs/hermes-run-adapter-contract.md index 206b066b..1463d4fa 100644 --- a/docs/rfcs/hermes-run-adapter-contract.md +++ b/docs/rfcs/hermes-run-adapter-contract.md @@ -199,8 +199,10 @@ diagnostic or manual validation plan. | Behavior | Acceptance criterion | Why it matters | First slice that must prove it | |---|---|---|---| -| Restart/reconnect mid-stream | start a run, restart only WebUI, reload browser, replay/catch up from cursor, final state matches | proves active work no longer depends only on WebUI process memory | journal/replay slice | +| Journal replay after refresh/reconnect | reconnect or restart after events have been journaled can replay from cursor without duplicate transcript/tool/reasoning state | proves the browser contract is replayable and duplicate-safe | journal/replay slice | | Terminal replay | completed/failed/cancelled runs replay terminal state and do not duplicate transcript content | prevents stale spinner and duplicate-message regressions | journal/replay slice | +| Interrupted/stale run diagnostics | if WebUI restarts while execution is still owned by the WebUI process, replay shows the last journaled state and a clear interrupted/stale diagnostic instead of pretending the run kept executing | keeps slice 1 honest before a runner exists | journal/replay slice | +| Execution survives WebUI restart | active execution outlives the main WebUI process, reconnect discovers the active run, ordered replay catches up, and controls such as cancel still work | proves execution ownership actually moved out of the request process | runner/sidecar or external-runtime slice | | Cancel during tool call | cancel emits one terminal cancelled state and no stale writeback | catches historical stream ownership races | control migration slice | | Cancel during reasoning | partial/reasoning content is preserved cleanly and final state is not provider-error | catches cancellation classification regressions | control migration slice | | Approval request/response | approval survives observation, browser response reaches runtime, result is replayable | approval callbacks are cross-cutting and easy to orphan | approval migration slice | @@ -255,13 +257,15 @@ Revert path: Success criterion: 1. Start a non-trivial WebUI run. -2. Restart only `hermes-webui` while the run is active or shortly after terminal - state. -3. Reload the browser/session. -4. Rediscover the run from journal metadata. -5. Replay from cursor without duplicate visible transcript content. -6. Render the same token/reasoning/tool/status/terminal state the workbench would - have rendered without the restart. +2. Refresh/reconnect the browser, or restart WebUI after events have already been + journaled. +3. Rediscover the run from journal metadata. +4. Replay from cursor without duplicate visible transcript content. +5. Render the same already-journaled token/reasoning/tool/status/terminal state + the workbench would have rendered without the reconnect. +6. If WebUI restarted while execution was still owned by the WebUI process, show + an explicit interrupted/stale diagnostic rather than claiming the active run + kept executing. ### Slice 2: Adapter interface over the journaled legacy path @@ -303,19 +307,39 @@ Scope: Revert path: disable runner backend and fall back to journaled legacy backend. -## First Meaningful Success Criterion +## First Meaningful Success Criteria -The first meaningful milestone is not "basic chat streams through a new module." -It is: +The first meaningful milestones are deliberately split. + +### Journal / Replay Gate + +This gate belongs to Slice 1. It does not prove active execution survives a WebUI +process restart, because execution is still owned by the WebUI process in this +slice. + +It proves: + +1. A WebUI run emits append-only journal events with stable cursors. +2. Browser refresh/reconnect can replay already-journaled events from cursor. +3. Terminal `done`, `error`, or `cancelled` state replays without duplicate + transcript content. +4. Tool/reasoning/status state can be reconstructed from replayed journal events. +5. If WebUI restarts before execution ownership has moved out of process, the UI + can show a clear interrupted/stale diagnostic for the last journaled run state. + +### Execution-Survives-WebUI-Restart Gate + +This stronger gate belongs to the runner/sidecar or external-runtime slice, not +Slice 1. It proves execution ownership has actually moved out of the main WebUI +request process: 1. Start a long-running run from WebUI. 2. Restart only `hermes-webui`. -3. Keep the active run observable through durable journal state. +3. Keep the active run executing outside the restarted WebUI process. 4. Reload the browser/session. -5. Replay/catch up from cursor. +5. Rediscover the active run and replay/catch up from cursor. 6. Preserve the rendered workbench state without duplicate transcript content. -7. If the run is still active, cancellation still works through the existing - control path until the control migration slice replaces it. +7. If the run is still active, cancellation still works. If this works without moving runtime ownership into a new pile of process-local globals, the architecture is moving in the right direction.