Commit Graph

1021 Commits

Author SHA1 Message Date
Isla Liu 2a303de2a3 fix(session): preserve retry budget while journal is still arriving 2026-05-20 20:55:07 +08:00
Isla Liu d5a185d9c6 fix(session): serialize lazy journal retry per session 2026-05-20 20:48:38 +08:00
Isla Liu 9870e8f111 fix(session): address Copilot review — scope tool-card dedupe by stream id + tighten docs
Four code-review comments from the automated Copilot reviewer on this PR:

1. `_journal_tool_already_present` dedupe was session-wide, so a
   legitimately-repeated tool (e.g. a second `terminal: ls` in an
   earlier turn) could cause the retry path to falsely skip
   materializing the recovered tool card.  The helper now takes a
   keyword `stream_id` argument; when supplied, a tool card whose
   `_recovered_stream_id` is set AND differs from the candidate is no
   longer treated as a duplicate.  Untagged tool cards (live tools, or
   tool cards carried over from a pre-tagging core transcript) still
   match, preserving the existing 'core transcript already has this
   tool, don't duplicate' invariant.  Two new tests in
   `TestJournalToolDedupeScoping` cover both legs of the rule.

2./3. The troubleshooting FAQ pointed at `~/.hermes/webui/sessions/session_<sid>.json`
   and `~/.hermes/_run_journal/...`.  The actual sidecar filename has
   no `session_` prefix and the run-journal lives under the WebUI
   sessions dir (`~/.hermes/webui/sessions/_run_journal/<sid>/<stream>.jsonl`,
   default).  Both paths fixed and an explicit note added about
   `HERMES_WEBUI_STATE_DIR` overriding the state root.

4. Drop unused `json` / `queue` / `Path` imports from
   `tests/test_session_lost_response_regression.py` so the file stops
   carrying noise that future linting would flag.
2026-05-20 12:18:03 +08:00
Isla Liu 75a26174aa fix(session): lazily retry run-journal recovery so the interrupted-turn marker self-heals
When the WebUI process restarts mid-stream and sidecar repair runs while
the run-journal for the dead stream is not yet visible on disk (WSL2 9p
/ DrvFs page-cache loss, un-fsynced journal tail on network FS, …),
`_append_journaled_partial_output()` returns False and the marker is
permanently baked with the "no agent output was recovered" wording even
though the journaled tokens appear on disk shortly afterwards.

This commit reframes the recovery contract so the read side can
self-heal:

  * `_interrupted_recovery_marker` gains a `pending_retry=True` mode
    that produces a third wording ("Recovering the partial output …
    reload this session to retry.") and stamps a
    `_pending_journal_recovery` flag.
  * `_apply_core_sync_or_error_marker` now writes that pending-retry
    marker (with `_journal_retry_stream_id`,
    `_journal_retry_attempts`, `_journal_retry_first_seen_ts` meta)
    whenever it cannot recover visible output AND the stream id is
    known. The legacy "no output" wording is reserved for the
    no-stream-id case. The core-sync branch leaves marker emission to
    the existing visible-output check (the core transcript itself is the
    canonical history in that branch).
  * A new `_retry_journal_recovery_in_place(session)` helper re-runs
    `_append_journaled_partial_output(…, dedupe_existing=True)` for the
    latest pending marker. On success the marker is promoted in place to
    the recovered-output wording, the journaled rows are reordered to
    sit above the marker (preserving chronological order), and all
    retry meta is stripped. On failure attempts is incremented; after
    _JOURNAL_RETRY_MAX_ATTEMPTS (12) or _JOURNAL_RETRY_GIVEUP_SECONDS
    (24h) the marker is demoted to a neutral "Partial output may have
    been lost." wording.
  * `get_session()` cheaply short-circuits via
    `_session_has_pending_journal_retry()` and invokes the helper on
    both cache-hit and cold-load paths when a pending marker is found.
    `metadata_only=True` skips the helper to keep sidebar refresh
    cheap. The retry call runs OUTSIDE the SESSIONS LOCK to avoid a
    deadlock with `session.save()` write paths.

No streaming write path or run_journal fsync behaviour is changed — the
fix is read-side only.
2026-05-20 11:58:26 +08:00
nesquena-hermes ed6ee3e067 Stage 388: PR #2607
# Conflicts:
#	CHANGELOG.md
2026-05-20 00:17:48 +00:00
nesquena-hermes a201401236 Stage 388: PR #2524 2026-05-20 00:17:48 +00:00
nesquena-hermes bd819f5e68 Stage 388: PR #2533 2026-05-20 00:17:47 +00:00
nesquena-hermes 7c3dcce1d0 Stage 388: PR #2598 2026-05-20 00:17:47 +00:00
Eleanor Berger 4598adfd04 feat: add Geist Contrast skin 2026-05-20 00:09:06 +00:00
AJV20 cb0850208d fix(session): dedupe messaging transcript timestamps 2026-05-19 19:17:43 -04:00
AJV20 54b6c38578 feat(health): expose WebUI stream runtime diagnostics 2026-05-19 22:48:10 +00:00
AJV20 739c948e74 fix(system): allow browser-only dashboard links 2026-05-19 22:47:55 +00:00
AJV20 612fcd30fe fix: avoid duplicate live tool events 2026-05-19 18:41:08 -04:00
nesquena-hermes 6d43116794 Stage 387: PR #2573 2026-05-19 22:10:20 +00:00
nesquena-hermes cc8ef201be Stage 387: PR #2600 2026-05-19 22:10:20 +00:00
nesquena-hermes 93727897b6 Stage 387: PR #2605
# Conflicts:
#	api/routes.py
2026-05-19 22:10:20 +00:00
nesquena-hermes 1ddb18264e Stage 387: PR #2604
# Conflicts:
#	CHANGELOG.md
2026-05-19 22:08:56 +00:00
nesquena-hermes 4bb60d9b10 Stage 387: PR #2601 2026-05-19 22:08:56 +00:00
nesquena-hermes e63de7c15f Stage 387: PR #2593
# Conflicts:
#	CHANGELOG.md
2026-05-19 22:08:56 +00:00
nesquena-hermes 536a8b7636 Stage 387: PR #2566 2026-05-19 22:08:55 +00:00
Lumen Yang dc5c8168d1 fix(webui): refresh active session on external sidecar updates 2026-05-19 21:34:08 +00:00
Michael Lam 1ebfbf3527 fix: reconcile session metadata counts 2026-05-19 14:28:20 -07:00
keyos ada59d73e6 fix(approval): simplify gateway_keys expression and document race window
Drop the redundant 'if gw_data else []' guard — gw_data is already
guaranteed to be a dict by the 'or {}' fallback above.

Add a one-line comment explaining the peek-without-pop race window:
a concurrent resolver may pop a different gateway entry, but
approve_session is idempotent over the session key set so the
outcome is the same regardless.
2026-05-19 20:56:22 +00:00
keyos 729ed415ff fix(approval): peek _gateway_queues for session-level approval when _pending is empty
During active streaming, dangerous-command approvals go through the
gateway path and are stored in _gateway_queues as _ApprovalEntry
objects, not in _pending. The _resolve_approval_legacy helper only
looked at _pending, so 'Allow for this session' never called
approve_session() — the user clicked Allow, the card vanished, but
the next dangerous command asked again.

Now when _pending has no matching entry, the helper peeks into
_gateway_queues to extract pattern_keys, calls approve_session(),
and marks found_target=True so resolve_gateway_approval also fires.

This commit is re-scoped to peek-only (no agent_session_key round-trip,
no state_db metadata changes).

Includes:
- Import + fallback for _gateway_queues
- Null-safe key filtering in all_keys
- Source-contract test (static) + functional test with
  @requires_agent_modules skip marker for CI
- All comments and docstrings in English
2026-05-19 20:24:05 +00:00
starship-s 37df7d76a4 fix(webui): prevent composer draft rollback on refresh 2026-05-19 13:31:12 -06:00
Michael Lam 5770323188 feat(runtime): add runner adapter facade 2026-05-19 12:06:57 -07:00
AJV20 ebb4dffc7d fix: stream live tool callback events 2026-05-19 14:55:19 -04:00
Lumen Yang 8d2b9d4a16 feat(webui): render indexed context metadata 2026-05-19 18:52:50 +00:00
Bryan Bartley 94ceb66c17 docs: clarify folder-zip cap bounds wall-clock/bandwidth not RSS
Per reviewer note: because the zip streams straight into handler.wfile
(no io.BytesIO buffering), peak memory is bounded by zipfile's per-file
read buffer, not the HERMES_WEBUI_FOLDER_ZIP_MAX_MB cap. Adds a comment
so the next reader doesn't have to trace it to learn the cap's actual
shape.
2026-05-19 13:44:56 -05:00
nesquena-hermes 6c0f864b10 Stage 386: PR #2587
# Conflicts:
#	CHANGELOG.md
2026-05-19 18:20:47 +00:00
nesquena-hermes 86f52f67b8 Stage 386: PR #2581
# Conflicts:
#	api/streaming.py
2026-05-19 18:20:47 +00:00
nesquena-hermes 0585881511 Stage 386: PR #2583 2026-05-19 18:20:07 +00:00
nesquena-hermes 9a512194d5 Stage 386: PR #2582
# Conflicts:
#	CHANGELOG.md
2026-05-19 18:20:07 +00:00
Michael Lam 0736e45485 fix: dedupe tool-only partial recovery markers 2026-05-19 11:16:21 -07:00
Lumen Yang a8d429775c fix(webui): preserve casual chat compaction guard 2026-05-19 14:34:58 +00:00
AJV20 f93e288214 Fix stale stream recovery writeback race 2026-05-19 10:26:45 -04:00
dobby-d-elf 2a95c1e482 Fix profile-aware assistant display names 2026-05-19 07:17:11 -06:00
Michael Lam 71d8a8fb1b fix: reap terminal shells on shutdown 2026-05-19 04:57:51 -07:00
starship-s 2e9ca283dc fix: display canonical cache hit percentage 2026-05-19 02:27:12 -06:00
Lumen Yang 600bb48970 fix(webui): use active state db for metadata summary 2026-05-19 08:02:43 +00:00
Lumen Yang 6ca63e5815 perf(webui): keep external refresh metadata cheap 2026-05-19 08:02:43 +00:00
Lumen Yang a63ab310b5 fix(webui): preserve reconciled session invariants 2026-05-19 08:02:43 +00:00
Lumen Yang 467ef33a24 feat(webui): reconcile external session updates
When API server runs append messages directly to state.db, reconcile WebUI sidecar sessions with those canonical rows across API responses, model-facing streaming context, and active browser refresh.

Add append-only state.db merge helpers, metadata-only counts for refresh polling, and regression coverage for API visibility, context incorporation, and frontend refresh behavior.
2026-05-19 08:02:43 +00:00
nesquena-hermes 54875f2110 Stage 385: PR #2550 2026-05-19 03:13:47 +00:00
nesquena-hermes d92e44ef5a Stage 385: PR #2568
# Conflicts:
#	CHANGELOG.md
2026-05-19 03:13:47 +00:00
Michael Lam 1827ea3efd fix: add Grok OAuth provider catalog support 2026-05-18 19:51:01 -07:00
Dennis Soong ea978a1989 fix: surface auto-compression handoff 2026-05-19 10:45:43 +08:00
Bryan Bartley 6caf86ba96 feat(workspace): download folder as zip via /api/folder/download
Adds a "Download Folder" item to the workspace file-tree right-click
menu and a GET /api/folder/download endpoint that streams the
directory as a zip with Content-Disposition: attachment.

Configurable caps:
  HERMES_WEBUI_FOLDER_ZIP_MAX_MB    (default 1024)
  HERMES_WEBUI_FOLDER_ZIP_MAX_FILES (default 50000)

Pre-flights the walk so cap-exceeded returns 413 + JSON BEFORE any
zip bytes are sent. Symlinks resolving outside the workspace are
skipped. Mirrors the existing _handle_file_raw shape (session_id
resolution, safe_resolve, RFC 5987 filename via
_content_disposition_value). Stdlib zipfile only; no new dependencies.

Tests: 11 static-inspection tests matching the style of
tests/test_issue1867_upload_size_preflight.py. All passing on
Python 3.11/3.12/3.13.
2026-05-18 21:40:02 -05:00
nesquena-hermes 0bb8fde586 Mark ControlResult unsafe_hash=False with explainer (Opus advisor followup) 2026-05-18 22:50:45 +00:00
nesquena-hermes 4f90fc5339 Stage 384: PR #2544 2026-05-18 22:44:02 +00:00