Commit Graph

186 Commits

Author SHA1 Message Date
Hermes Agent 75a2464821 stage-359: apply Opus SHOULD-FIX — symmetric runtime-field clearing on snapshot load-and-mark path 2026-05-15 15:27:24 +00:00
Hermes Agent fb8b91019e Merge pull request #2295 into stage-359
fix: clear runtime fields on compression snapshots (ai-ag2026)

# Conflicts:
#	CHANGELOG.md
#	api/streaming.py
2026-05-15 15:06:35 +00:00
Hermes Agent 4826a31fbc Merge pull request #2285 into stage-359
fix: hide pre-compression snapshots from sidebar (dso2ng, refs #2230)

# Conflicts:
#	CHANGELOG.md
2026-05-15 14:55:19 +00:00
Frank Song cadcf983d5 Tighten silent failure shrink detection 2026-05-15 18:04:53 +08:00
Dennis Soong eb31b4ed1e test: tighten compression snapshot preservation coverage 2026-05-15 17:31:37 +08:00
ai-ag2026 3a4259476d fix: clear runtime fields on compression snapshots 2026-05-15 09:20:19 +02:00
Dennis Soong bfccdc5c94 fix: hide pre-compression snapshots from sidebar 2026-05-15 11:20:17 +08:00
fxd-jason 1e80b51560 fix: align usage-overwrite test FakeAgent with real agent message format
The FakeAgent in test_issue1857_usage_overwrite returned only 2 messages
(user + assistant) without the conversation history. The real agent always
returns the full history plus new messages. This mismatch caused the new
_has_new_assistant_reply helper (which checks only messages beyond the
pre-turn offset) to see len(result)==len(prev) and incorrectly flag the
turn as a silent failure.

Fix: prepend conversation_history to the FakeAgent's response so the
message list mirrors production behavior.
2026-05-14 14:48:08 +08:00
fxd-jason 120ec5eba2 fix: silent failure detection scans only new messages, not full history
When a provider error (401/429/rate-limit) causes the agent to return
without producing a new assistant reply, the WebUI should emit an
apperror event so the user sees an inline error. However, the detection
logic scanned ALL messages in result['messages'] — which includes the
full conversation history. If any prior turn had an assistant response,
_assistant_added would be True and the apperror would be silently
skipped, leaving the user staring at a blank response.

Extract a helper _has_new_assistant_reply(all_messages, prev_count)
that only inspects messages beyond the pre-turn history offset. Apply
it to both the main detection path and the self-heal/retry path.

Tests: 15 new cases covering history masking, empty content, whitespace,
edge-case shrinks, and multi-assistant scenarios.
2026-05-14 14:34:19 +08:00
Hermes Agent 3d34a72ee8 stage-353: apply Opus SHOULD-FIX — unconditional parent_session_id stamp on compression rotation
Opus identified that PR #2227's preservation block had two related bugs in
the parent_session_id handling:

1. During preservation save: code did
     _old_parent = s.parent_session_id
     s.parent_session_id = None
     s.save(touch_updated_at=False, skip_index=True)
     s.parent_session_id = _old_parent
   The save persisted parent=None to disk. The in-memory restoration didn't
   reach the disk copy. Result: a /branch fork session that subsequently
   compressed lost its 'Forked from X' badge on the preserved old snapshot.

2. Stamping the continuation: code did
     if not s.parent_session_id:
         s.parent_session_id = old_sid
   The 'if not' guard skipped the stamp when the session already had a
   parent_session_id from a prior fork. Result: fork-of-fork compression
   broke lineage — the continuation jumped back to the original fork parent
   instead of the just-preserved immediate predecessor snapshot.

Fix (matches Opus's recommendation):
  - Remove the parent clearing during preservation save (preserve as-is)
  - Drop the 'if not' guard; always stamp continuation to old_sid

This makes the lineage chain consistent: new → old → old.parent → ... root.
Traversal from the continuation always walks through the just-preserved
snapshot to get to its parent's parent, never jumping over the snapshot.

Two new regression tests pin both invariants:
  - test_parent_session_id_stamped_unconditionally (no 'if not' guard)
  - test_old_session_parent_preserved_during_archive_save (no parent=None)

Both pass against the fix. All 8 tests in the file pass.
2026-05-14 03:59:02 +00:00
RØG3R L!M4 5bbf18324c fix: preserve session history during compression rotation (#2223)
The previous implementation renamed old_sid.json → new_sid.json during
context compression, destroying the only persistent copy of the full
conversation history. If the summarisation LLM call also failed, the
user was left with zero recoverable messages.

Fix:
- Remove the destructive old_path.rename(new_path) call
- Preserve old_sid.json as an immutable pre-compression archive
- Create new_sid.json as a fresh file via s.save()
- Set parent_session_id on the continuation session for lineage
- Save in-memory messages to old_sid.json if they're newer than disk

Test: test_issue2223_compression_no_rename.py (6 tests, all passing)
2026-05-14 03:02:44 +00:00
Frank Song 28ec3af697 fix: strip only leading user-asking wrapper line
Refs #2215 Fix B: remove the mid-response stripping hazard without losing leading multi-line wrapper cleanup.

The pattern now strips only a leading 'the user is asking' wrapper line and preserves the visible answer that follows. Add regression coverage for both the leading-wrapper and mid-response prose cases.
2026-05-14 09:14:28 +08:00
Frank Song dc213d47b8 fix: preserve literal thinking tags 2026-05-14 07:13:34 +08:00
Hermes Agent 7209e89ef4 stage-350: apply Opus SHOULD-FIX — tighten _partial_already_present dedup scope
Opus flagged that PR #2151's cancel-handler partial-dedup loop used a
substring check that was too broad: any short prior assistant reply
('OK', 'Here is the answer:') would dedup a longer new partial containing
it, silently dropping the partial and resurrecting the #893 data-loss bug.

Tightened to only dedup against actual prior _partial=True markers with
exact (whitespace-stripped) content match. Three new regression tests
added (short-non-partial-prefix-does-not-dedup, exact-partial-match-still-
dedups, same-content-non-partial-does-not-dedup).

10/10 partial-cancel tests pass after the fix. Also updated CHANGELOG with
the conflict-resolution notes for #2151 vs #2136 and the #2178 test-fix.
2026-05-13 21:11:01 +00:00
Hermes Agent 3f851051cf Merge pull request #2151 into stage-350
fix: clarify cancelled chat turn status (Jordan-SkyLF)

Conflict resolution on api/streaming.py:4549-4567 (the cancel-handler
ownership guard). Both this PR and the already-shipped PR #2136 add a
guard at the same site against stale stream writebacks, from different
angles:

  - PR #2136 (HEAD): _stream_writeback_is_current(_cs, stream_id) — strictly
    dominates by checking the active_stream_id token equality.
  - PR #2151: 'worker won the race' check via (active_stream_id != stream_id
    and not pending_user_message), with _emit_cancel_event = False to suppress
    the terminal cancel event.

Resolution merges both: keep #2136's strictly-stronger condition for skip
detection, and adopt #2151's _emit_cancel_event = False semantic so the
cancel event isn't emitted in addition to skipping the writeback (when
client may have already received the successful done payload).

55/55 tests pass across cancelled-turn-status + stale-stream-writeback +
the four cancel/data-loss sibling test files.
2026-05-13 20:44:44 +00:00
Hermes Agent 7150e9fe70 Merge pull request #2202 into stage-349
feat: show early session titles on chat start (Jordan-SkyLF)
2026-05-13 19:03:03 +00:00
Jordan SkyLF 0381294f1c feat: add early session provisional titles 2026-05-13 11:37:11 -07:00
MrFant 520795fdd2 fix: preserve reasoning_content in API message whitelist
Providers like Xiaomi MiMo, DeepSeek, and Kimi require reasoning_content
to be echoed back on every assistant message in multi-turn conversations
with tool calls. Omitting it causes HTTP 400: 'The reasoning_content in
the thinking mode must be passed back to the API.'

The WebUI's _sanitize_messages_for_api() strips all fields not in
_API_SAFE_MSG_KEYS before sending conversation history to the LLM API.
reasoning_content was not in this whitelist, so it was silently dropped.

The CLI path (run_agent.py) is unaffected because it has its own
_copy_reasoning_content_for_api() logic that operates on raw message
dicts without going through this filter. This is why the same session
works from CLI but fails from WebUI with HTTP 400.

The fix adds 'reasoning_content' to _API_SAFE_MSG_KEYS so the field
passes through sanitization intact.
2026-05-14 02:29:17 +08:00
Lumen Yang 3289c44fb6 fix: refresh context ring after compression 2026-05-13 14:02:28 +02:00
Frank Song 9ea4f1145d Fix stale stream exception writeback guards 2026-05-13 10:23:03 +08:00
Hermes Agent 20717a0d0a Merge pull request #2136 into stage-345
fix: guard stale stream writebacks (LumenYoung)

Prevents stale WebUI stream workers from writing old results into a session
after that session has already moved on to another stream. Adds new helper
_stream_writeback_is_current() (a token equality check against the session's
active_stream_id) and short-circuits the two finalize/cancel paths when the
worker no longer owns the session writeback.
2026-05-12 23:11:48 +00:00
Jordan SkyLF 112eadc209 fix: address cancelled turn review feedback
- classify string-only CancelledError payloads as cancelled
- centralize cancel marker substring matching
- add targeted regression coverage
2026-05-12 15:43:36 -07:00
Lumen Yang 4b57b202a0 fix: guard stale stream writebacks 2026-05-13 00:05:09 +02:00
Jordan SkyLF e4d16e93c7 fix: clarify cancelled chat turn status 2026-05-12 13:26:49 -07:00
Hermes Agent a06952ab00 Merge pull request #2140 into stage-344
Preserve fallback provider credential hints (closes #2133)

# Conflicts:
#	CHANGELOG.md
2026-05-12 16:12:54 +00:00
Frank Song 76e611d49f Preserve fallback provider credential hints 2026-05-12 20:42:55 +08:00
Michael Lam 265496782a docs: clarify compression anchor helpers 2026-05-12 01:43:16 -07:00
nesquena-hermes d75b59135a stage-341: apply Opus SHOULD-FIX (it i18n + short-circuit logger.debug + docstring)
Opus advisor pass on stage-341 found three surgical items:

1. static/i18n.js:it — PR #2064 branched before stage-340 landed the 'it'
   locale (#2067), missing 9 session_*worktree* keys. Mechanical mirror of
   en/ja position. Italian falls back to English silently without this fix.
2. api/streaming.py — PR #2107's new break short-circuit was silent in both
   the aux and agent title-generation paths. Added logger.debug calls before
   each break so production logs surface the exit shape.
3. api/streaming.py — Expanded _title_should_skip_remaining_attempts docstring
   to document the membership criterion explicitly (vs the implicit
   reasoning-only-burn case it ships with today). Future additions
   (llm_safety_blocked, llm_oauth_quota) have a clear inclusion test.

CHANGELOG updated under the Stage-341 maintainer fixes section to mirror
the stage-340 pattern. All targeted tests pass (57/57 in the affected
modules).
2026-05-12 00:16:33 +00:00
nesquena-hermes e20eb2c784 fix: skip budget-doubling title retry for reasoning-only responses (#2083)
Reasoning models (Qwen3-thinking via LM Studio, DeepSeek-R1, Kimi-K2,
etc.) can burn their entire output budget on hidden reasoning tokens and
emit no visible content. The previous title-generation retry path
classified that as llm_length and doubled the budget — but the second
call produces the same shape, so the retry only doubled the GPU/credit
burn. Repeated across the two prompts in _title_prompts() this came to
~3000 reasoning tokens of GPU work per new chat. On local LM Studio
servers behind a custom: provider (where is_lmstudio=False means
reasoning_effort: none never reaches the model) it manifested as the GPU
never going idle after a prompt.

Fix:
  - _extract_title_response: classify reasoning-bearing empty responses
    as llm_empty_reasoning regardless of finish_reason. The presence of
    reasoning_content is the diagnostic signal, not finish_reason.
  - _title_retry_status: drop llm_empty_reasoning from the retry set.
    Length-truncated responses WITHOUT reasoning still retry (those are
    legitimately recoverable by a larger budget).
  - Add _title_should_skip_remaining_attempts() and break out of the
    prompt-iteration loop on empty-reasoning. A second prompt against
    the same model would produce the same shape.
  - Falls through to _fallback_title_from_exchange for a local-summary
    title.

Tests updated to invert the previous reasoning-retry assertions:
  - test_aux_short_circuits_on_empty_reasoning_without_retrying
  - test_aux_still_retries_finish_length_without_reasoning
  - test_agent_route_short_circuits_on_empty_reasoning_without_retrying
  - test_agent_route_still_retries_finish_length_without_reasoning

Companion agent-side work (LM Studio classifier for custom: providers)
is tracked separately on the hermes-agent side; this WebUI fix is the
belt-and-braces guard so the loop stops regardless of agent classifier
state.

Reported by @darkopetrovic. Closes #2083.

Co-authored-by: darkopetrovic <darkopetrovic@users.noreply.github.com>
(cherry picked from commit efeae4a86e)
2026-05-12 00:04:11 +00:00
nesquena-hermes fd069155af Merge PR #2062 into stage-339
feat: record turn journal lifecycle events
by @ai-ag2026
2026-05-11 17:43:58 +00:00
nesquena-hermes 6a016dae6c Merge PR #2077 into stage-338
Refactor compression anchor visibility helpers
by @franksong2702
2026-05-11 17:17:25 +00:00
ai-ag2026 c864ad47af fix: address turn journal lifecycle review 2026-05-11 17:16:43 +02:00
Frank Song 18124ced62 Refactor compression anchor visibility helpers 2026-05-11 20:56:30 +08:00
Frank Song a0e9c06102 Fix HERMES_HOME skill cache patching 2026-05-11 19:12:02 +08:00
ai-ag2026 4b486f2860 feat: record turn journal lifecycle events 2026-05-11 09:13:25 +02:00
Frank Song 5a445e7562 Fix duplicate assistant transcript merge 2026-05-11 13:09:16 +08:00
nesquena-hermes 97b283c5a4 Merge PR #2039 into stage-335 2026-05-11 00:25:07 +00:00
ai-ag2026 2ead7daa2f fix: expose active run lifecycle in health 2026-05-11 02:15:00 +02:00
Michael Lam d620f4394a fix: prewarm skill imports outside env lock 2026-05-10 15:51:49 -07:00
nesquena-hermes 2377216860 Stage 333: PR #2009 — feat(context): live status tracking during streaming by @dobby-d-elf 2026-05-10 18:16:59 +00:00
nesquena-hermes 22991fa820 Merge remote-tracking branch 'origin/master' into stage-331
# Conflicts:
#	CHANGELOG.md
2026-05-10 18:03:55 +00:00
nesquena-hermes c156e5a256 Stage 331: PR #2006 — fix(compression): stamp profile on continuation session by @qxxaa 2026-05-10 17:09:21 +00:00
nesquena-hermes 9060bdb344 Stage 330: PR #2001 — fix(clarify): honor clarify.timeout config by @franksong2702 2026-05-10 17:07:37 +00:00
nesquena-hermes 7eced19463 Stage 330: PR #2000 — fix(skills): patch module-level caches on per-request profile switch by @qxxaa 2026-05-10 17:07:37 +00:00
dobby-d-elf fecfc5f6db fix: reanchor live context usage updates 2026-05-10 10:31:14 -06:00
dobby-d-elf 56d68b7511 fix: keep live context metering session-scoped 2026-05-10 08:20:37 -06:00
dobby-d-elf 1cf0ff01b5 feat: live context window status tracking during streaming 2026-05-10 06:51:46 -06:00
qxxaa f665e50738 fix: stamp profile on continuation session after context compression
When context compression fires, the agent rotates to a new session_id.
The compression migration block correctly migrates the session lock,
SESSION_AGENT_CACHE, SESSIONS dict, and the session file rename, but
does not ensure s.profile is set on the continuation session.

On the next request, _run_agent_streaming resolves the profile via:

    get_hermes_home_for_profile(getattr(s, 'profile', None))

With s.profile == None this falls back to the default profile's
HERMES_HOME. Memory tool calls then read and write the wrong profile's
MEMORY.md — confirmed by investigation: session 0dfefb (continuation
after compression from a troubleshooting profile session) read memory
at 16% / 1,184 chars with 4 entries, while the troubleshooting profile's
actual state was 72-77% / 5,000+ chars. That reading could only come
from the default profile's bank. Subsequent replace operations failed
because the target entries existed only in the troubleshooting profile.

There are two failure paths:

1. In-memory: if s.profile was None from the start (legacy session or
   one created before this fix), the continuation session object carries
   null through the current request.

2. Persistence: s.save() persists "profile": null to the continuation
   session's JSON file (profile is in METADATA_FIELDS, models.py ~408).
   On the next request, Session.load(new_sid) reads it back as null and
   get_hermes_home_for_profile(None) falls back to the default profile.

Fix: capture _resolved_profile_name at request entry (~line 2019),
immediately after profile home resolution. This is the only point where
profile context is reliable: s.profile if already set, otherwise
get_active_profile_name() — which at that point reads thread-local
storage (_tls.profile) correctly set by the HTTP handler thread via
set_request_profile(). Calling get_active_profile_name() at compression
time instead would be unsafe: the streaming thread is a separate
threading.Thread, does not inherit TLS, and the call would fall back to
the process-global _active_profile which may belong to a different
concurrent tab.

Stamp s.profile in the compression migration block immediately after
s.session_id = new_sid. Guarded by `if not s.profile` so sessions that
already have a profile set are unaffected. A logger.info line records
when the stamp fires, making future investigation straightforward.

Fixes: memory writes bleeding into default profile after compression
Reproduces: reliably on any long non-default profile session that hits
the compression threshold (default: 0.80 context fill)
2026-05-10 09:57:45 +01:00
Frank Song 1bec8070f2 fix(1833): persist compression anchor summary for reload UI 2026-05-10 16:45:16 +08:00
Frank Song 2e6b3601bd fix(clarify): honor clarify.timeout config in webui prompts 2026-05-10 16:05:50 +08:00