Read agent.max_turns when constructing streaming WebUI AIAgent instances, pass it as max_iterations when supported, and include it in the per-session agent cache signature so budget changes take effect.
Add regression coverage for the config read, constructor kwarg, and cache key.
Closes#1695.
@Patrick-81 reported the bare "AIAgent not available -- check that
hermes-agent is on sys.path" error on a symlinked install (~/Programmes/hermes-agent
linked to ~/hermes-agent). The maintainer's response — three diagnostic
commands plus `pip install -e .` in the agent dir — fixed it for them.
This PR captures both halves of that learning so the next user with the
same shape doesn't have to file a new issue:
1. **Error message diagnostic block.** New helper
`_aiagent_import_error_detail()` in api/streaming.py builds a multi-line
diagnostic when the import fails, including:
- the running Python interpreter
- HERMES_WEBUI_AGENT_DIR (set value, or "(not set)")
- sys.path entries that mention hermes/agent (or "no entries mention..."
— itself a strong diagnostic signal)
- the most-common fix (`pip install -e .` in the agent dir)
- a pointer to docs/troubleshooting.md
The original error message string is preserved as the FIRST line so
existing log scrapers and docs-search keep matching.
Helper is kept as a separate function so it stays out of the hot path
until we actually need to raise — building it on every successful import
would be wasted work.
2. **New docs/troubleshooting.md.** Symptom → Why → Diagnostic commands →
Fix → When-to-file-a-bug template, with one entry to start: the
"AIAgent not available" flow Patrick-81 walked through. Future
recurring failure modes follow the same template. Required a one-line
addition to .gitignore — docs/* is gitignored with an allowlist, and
the new file needed `!docs/troubleshooting.md` to be tracked.
3. **README link.** docs/troubleshooting.md added to the `## Docs` section
so users know where to look first.
13 regression tests in tests/test_1695_aiagent_import_error_detail.py:
9 for the helper output shape (preserves original message line, includes
running python, shows HERMES_WEBUI_AGENT_DIR set/unset both ways, includes
pip-install-e hint, points at troubleshooting doc, lists relevant sys.path
entries when present, says "no entries..." when absent, output is multi-line)
plus 4 for the docs-presence regression (file exists, has the AIAgent
section, includes pip install -e ., describes the diagnostic chain with
readlink + agent/__init__.py verification).
190 streaming/aiagent tests pass after the change. ast.parse on
api/streaming.py clean.
CI failure on prior push was due to the docs/* gitignore swallowing the
new troubleshooting.md file silently — this commit adds the allowlist
entry so the file is tracked.
Switch the per-turn duration fallback from `is not None` to a truthy check so
None, missing-attr, and an explicit 0 all uniformly fall back to time.time().
Without this, a 0 timestamp (e.g. via a buggy migration or manual file edit)
would yield `time.time() - 0` ≈ wall-clock-since-epoch, displaying nonsense
like 'Done in 56 years 4 months ...'. In practice pending_started_at is always
set via int(time.time()) so this is a hardening fix, not a live-bug fix.
Also drop the brittle source-string assertion in the regression test that
pinned the literal expression. The behavioural test
test_done_handler_persists_duration_on_last_assistant_message already proves
the duration field is set; pinning the source line broke twice during the
v0.50.290 release pipeline alone (Opus tightening + maintainer revert).
Fixes#1595
Signed-off-by: Sanjay Santhanam <51058514+Sanjays2402@users.noreply.github.com>
Spliced from #1531 by @Asunfly: take Change-1 only (the actual bug fix +
cache signature inclusion) and skip Change-2 (auxiliary title-route
extra_body change) which is a separate scope concern.
## What
Two surgical fixes in api/streaming.py:
1. Line 1820 — `_cfg.cfg.get(...)` → `_cfg.get(...)`. `get_config()` returns
a plain dict (not a wrapper exposing `.cfg`). The buggy line raised
AttributeError that the surrounding try/except swallowed, so
`_reasoning_config` was always None regardless of what `/reasoning
<level>` had been set to. Verified locally — `api/streaming.py:1959`
already correctly used `_cfg.get(...)` in the same function, so the
same `_cfg` was being read two different ways in one file.
2. Line 1888 — added `_reasoning_config or {}` to `_sig_blob`. Without
this, switching effort mid-session would fail to take effect because
the per-session agent cache key would still match the old entry.
Mirrors how `resolved_provider` / `resolved_base_url` already
participate in the signature.
## Why splice instead of merge #1531 directly
@Asunfly force-pushed a Change-2 onto #1531 after the original review
that removes `extra_body={"reasoning": {"enabled": False}}` from
`generate_title_raw_via_aux` (the auxiliary title-generation route).
That intent is reasonable (let operator-configured `extra_body.reasoning`
flow through to the title route) but it touches a different surface and
deserves its own PR.
The narrow concern is operators who selected a reasoning-capable
auxiliary title model without explicitly setting
`reasoning.enabled=False` in the task config — pre-Change-2 the WebUI
defended against accidental reasoning on the title hot path; post-Change-2
those configs would reason on every new conversation`s title, with cost
and latency implications.
## What is NOT in this PR
- The `generate_title_raw_via_aux` extra_body refactor (Change-2 from #1531).
- The `test_does_not_override_configured_reasoning_extra_body` test (guards
Change-2). Asunfly can re-open that as its own focused PR.
## Tests
Two new R17b/R17c regression assertions in tests/test_regressions.py:
- `test_streaming_reads_reasoning_effort_from_config_dict` — static-source
guard: `_cfg.cfg` must not return to streaming.py
- `test_streaming_agent_cache_signature_includes_reasoning_config` —
catches removal of `_reasoning_config` from `_sig_blob`
## Closes
- Closes#1531 (the Change-1 portion ships here; Asunfly can re-open
Change-2 as a separate PR if desired)
Co-authored-by: Asunfly <[email protected]>
Read configured max_tokens from config.yaml, pass it into WebUI-created AIAgent instances when supported, and include it in the agent cache signature. Also classify OpenRouter quota phrasing such as more credits, can only afford, and fewer max_tokens.
Adds regression coverage for max_tokens propagation, cache signature isolation, and quota error classification.
PR #1421 (SessionDB WAL handle leak fix on cached-agent reuse path) had a
sibling leak at the LRU eviction site that I caught during pre-review:
api/streaming.py SESSION_AGENT_CACHE.popitem(last=False) was discarding
the evicted entry with `evicted_sid, _ = ...`. The agent's _session_db
was dropped on the floor and only released when GC eventually finalized
the agent — which on a long-running server may be never (cyclic refs,
extension types holding C handles, etc.).
Same fix shape as #1421: capture the evicted entry, call
_evicted_agent._session_db.close() explicitly. SessionDB.close() is
idempotent + thread-safe (with self._lock: if self._conn:), so the
double-close-is-benign property still holds.
5 regression tests in test_v050259_sessiondb_fd_leak.py:
- Source-level: cached-agent reuse path closes before replace
- Source-level: LRU eviction path captures + closes evicted agent
- Behavioral: SessionDB.close() is idempotent (3 calls safe)
- Behavioral: cached-agent reuse with mock — close called exactly once
- Behavioral: LRU eviction with mock — only evicted agent's DB closes
Full suite: 3615 passed, 0 failed.
Nathan explicitly authorized 'just go ahead and merge it as a small release'
since the PR is 9 LOC, focused, has Opus pre-release follow-up + tests, and
matches the empirically-confirmed leak shape (73-handle leak at EMFILE).
SessionDB WAL handles leak when streaming.py creates a new SessionDB
instance per request and replaces the cached agent's _session_db without
closing the old one. Each orphaned connection holds 2 FDs (.db +
.db-wal), causing FD exhaustion and EMFILE crashes after ~73 messages.
Fix: close the previous _session_db before replacing it on cached
agents, mirroring the close-before-replace pattern used elsewhere in the
codebase.
Opus pre-release advisor caught a 5th issue not covered by my initial
follow-up sweep, this one CRITICAL: PR #1402#493 per-session toolset
override silently no-op'd every time.
Bug: api/streaming.py:1755 called _session_meta.get('enabled_toolsets') on
the result of Session.load_metadata_only(). It returns a Session INSTANCE,
not a dict. .get() raised AttributeError, which the surrounding bare
except swallowed silently. The toolset chip in the UI saved correctly to
disk, but the streaming agent always ran with global toolsets.
Fix: use getattr(_session_meta, 'enabled_toolsets', None).
Two new regression tests:
- Source-level: forbid the .get() / [] dict-access shape.
- Runtime: Session.load_metadata_only must return a Session instance.
Full suite: 3604 passed, 0 failed.
Persist session model_provider separately from model IDs so active/default provider selections like gpt-5.5 remain bare while routing through OpenAI Codex. Keep @provider:model for picker disambiguation and runtime bridging, and preserve explicit OpenRouter plus custom/proxy base_url routing.
Opus pass-2 review of v0.50.251 caught a critical regression in PR
#1375:
The cancel-partial message stored captured tool calls under the
'tool_calls' key. That key is whitelisted by _API_SAFE_MSG_KEYS so
_sanitize_messages_for_api forwarded the entries to the next-turn
LLM call. But the captured entries use the WebUI internal shape
({name, args, done, duration, is_error}) — they don't have the
OpenAI/Anthropic id + function: {name, arguments} envelope. Strict
providers (OpenAI, Anthropic, Z.AI/GLM) would 400 on the malformed
entries. Net effect: the very cancel-then-continue scenario PR
#1375 aimed to improve becomes a hard fail.
Fix:
- Rename the persisted key to '_partial_tool_calls' (underscore-
prefixed private key NOT in _API_SAFE_MSG_KEYS, so sanitize
correctly strips it).
- Update static/messages.js hasMessageToolMetadata check to also
recognize _partial_tool_calls for UI rendering.
- Update test_issue1361_cancel_data_loss.py assertion to check
_partial_tool_calls (and tool_calls as legacy fallback).
Plus 2 NIT fixes from the same Opus review:
NIT 1 (api/profiles.py:153): re.match → re.fullmatch for consistency
with other _PROFILE_ID_RE callers in the codebase. The trailing-
newline footgun ($ matches before final \n in re.match) is now
closed. Without #1373's is_dir() guard, a name like 'valid\n' would
have created a directory named 'valid\n' on Linux. Doesn't escape
<HERMES_HOME>/profiles/ via Path joining, but unintended.
NIT 2 (test_issue798.py): R19j coverage gaps — added trailing-
newline tests, length-boundary tests (64-char valid, 65-char
rejected), single-char minimum, and non-ASCII / Unicode-trick tests.
New regression test (tests/test_pr1375_partial_tool_calls_sanitize.py):
- test_partial_tool_calls_field_not_forwarded_to_llm: pins that
sanitize-for-API strips _partial_tool_calls + reasoning + does
NOT have tool_calls on a partial message
- test_legitimate_tool_calls_are_preserved_for_completed_turns:
pins that real OpenAI-shape tool_calls on completed turns survive
sanitize unchanged
Tests: 3486 passing (3484 → 3486, +2 sanitize tests).
Three distinct data-loss paths fixed:
§A — Reasoning text was accumulated in a thread-local _reasoning_text
inside _run_agent_streaming. cancel_stream() never saw it because it
went out of scope when the thread was interrupted. Now mirrored to a
new shared dict STREAM_REASONING_TEXT keyed by stream_id, populated
in on_reasoning() and the reasoning branch of on_tool(), read in
cancel_stream().
§B — Live tool calls in thread-local _live_tool_calls were similarly
invisible to cancel_stream(). Now mirrored to STREAM_LIVE_TOOL_CALLS
on tool.started + tool.completed.
§C — Reasoning-only streams produced no partial message because the
thinking-block regex strip returned empty string and the `if _stripped:`
guard skipped the append. Now appends the partial message when EITHER
content text, reasoning trace, OR tool calls exist.
Mirrors the existing STREAM_PARTIAL_TEXT pattern from #893 exactly:
same dict creation in _run_agent_streaming, same _live_config fallback
in cancel_stream, same cleanup in _periodic_checkpoint.
8 regression tests in tests/test_issue1361_cancel_data_loss.py
covering all three sections plus tools+text combinations.
Co-authored-by: bergeouss <bergeouss@users.noreply.github.com>
- api/streaming.py SSE payload now falls back to agent.model_metadata.get_model_context_length when compressor doesn't supply context_length (mirrors the session-save fallback shipped in v0.50.247).
- api/streaming.py also falls back to s.last_prompt_tokens to avoid using the cumulative input_tokens counter.
- static/ui.js tracks rawPct separately from pct and shows '(context exceeded)' tooltip when rawPct > 100 instead of misleading '100% used (0% left)'.
- static/messages.js clears 'Uploading...' composer status after upload completes.
Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
* fix(streaming): fallback to model_metadata for context_length when compressor missing (#1318 follow-up)
PR #1318 (shipped in v0.50.246 via PR #1341 + commit a5c10d5) persisted
context_length on the session so the context-ring indicator survives
page reloads. But the writer only fired when agent.context_compressor
was present and reported a non-zero value. Fresh agents, interrupted
streams, or compressors without the attribute would still leave
s.context_length=0 — and the indicator would still show 0% on reload.
This follow-up adds a fallback that calls
agent.model_metadata.get_model_context_length(model, base_url) when the
compressor didn't populate the value. The function returns a sensible
static context window for any known model (with a 256K default for
unknown models). Wrapped in a broad try/except because older
hermes-agent builds may not expose the helper.
Sourced from PR #1344 (@jasonjcwu) — extracted into this focused
follow-up after #1344 was closed as superseded by #1341.
Adds 6 structural tests covering: import + call presence, falsy-gate,
agent.model/base_url passing, exception swallowing, save() ordering,
result assignment.
Closes the data-flow gap in #1318 for the compressor-missing case.
* test: relax pr1341 block-size assertion to accommodate the new fallback
---------
Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
Pre-release Opus + nesquena review on v0.50.246 caught that PR #1341
added the data-structure scaffolding (Session.__init__ accepts the 3
fields, save() persists them, compact() exposes them, GET /api/session
returns them) but did NOT add the writer that actually populates them.
Without a writer, the user-visible bug (context-ring shows 0% after
page reload) was NOT fixed by #1341 alone — the fields stayed None
forever because nothing wrote to s.context_length anywhere.
Adds the writer at api/streaming.py:2188 (post-merge per-turn save block,
before s.save()) so the values from agent.context_compressor land on
disk and survive page reloads.
Also moves the SSE usage payload comment to clarify that the live SSE
payload and the session-level persistence are now distinct paths
(payload below, persistence above).
Adds tests/test_pr1341_context_window_persistence.py — 6 structural +
round-trip tests covering Session __init__/save/compact, the routes
response, and the streaming.py writer placement.
Closes#1318 (the actual user-visible bug, not just the scaffolding).
Pre-release Opus review on v0.50.246 caught a SHOULD-FIX in PR #1338's
cancel_stream synthesis: the symmetric substring guard
(_pending_user in _last_content OR _last_content in _pending_user) was too
loose. Common confirmation replies ("ok", "yes", "go") in the prior turn
would match longer follow-up prompts ("ok please continue"), the synthesis
would be skipped, and the user's typed text would be lost — exactly the
data-loss bug #1298 was supposed to fix.
The fix: gate the substring check on a timestamp comparison. Only treat
the latest user turn as 'already merged by the streaming thread' if its
timestamp is at or after pending_started_at. Earlier turns whose content
happens to be a substring of the pending must not short-circuit synthesis.
Also drops the symmetric (_last_content in _pending_user) branch — that
direction was the false-positive vector. Keeps the equality and prefix
match (workspace-prefix tolerance from the streaming thread).
Adds tests/test_issue1298_cancel_and_activity.py::
test_cancel_synthesizes_when_prior_turn_content_is_substring_of_pending —
regression for the exact 'ok' → 'ok please continue' scenario.
From PR #1338. Already independently APPROVED by nesquena before being absorbed into v0.50.246.
CHANGELOG entries from this PR were dropped during squash (the v0.50.245 section is already
shipped); they will be re-added under [v0.50.246] in the release commit.
Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
The agent cache stores full AIAgent instances (each holding complete
conversation history) without size limit. Long-running servers with
many sessions can accumulate unbounded memory usage.
Changes:
- Replace dict with OrderedDict for LRU tracking
- Add SESSION_AGENT_CACHE_MAX = 50 limit
- Evict least-recently-used entries when cache exceeds limit
- Call move_to_end() on cache hits to maintain LRU order
This prevents memory exhaustion on servers with many active sessions.