Two functions on the /api/session/handoff-summary hot path were opening
sqlite3.connect(...) inside a bare `with` statement, which commits the
transaction at scope exit but does NOT close the connection. Per-turn
invocations accumulated state.db / state.db-wal file descriptors and
CPython heap pages on long-lived worker threads, surfacing as the
multi-GB VmRSS and 6x duplicated state.db fds observed on the live
instance (D0 pre-restart baseline: VmRSS 1,334,248 kB, 55 fds; cold
baseline after restart: VmRSS 136,668 kB, 10 fds).
Wrap both call sites with contextlib.closing(...) (already imported and
used at seven other sites in the same files) so the connection is
closed deterministically:
- api/models.py :: count_conversation_rounds
- api/routes.py :: _persist_handoff_summary_to_state_db
Regression test:
tests/test_issue2233_sqlite_connection_leak.py loops both functions
20 times against a tmp state.db and asserts /proc/<pid>/fd count
does not grow more than 2. Linux-only via sys.platform skip.
D1 live soak against a freshly-built worktree server (port 8799,
isolated HERMES_HOME / HERMES_WEBUI_STATE_DIR) hitting
/api/session/handoff-summary 20 times:
fd_before = 5
fd_after = 5 (growth 0, threshold < 5)
vmrss_before = 52636 kB
vmrss_after = 52636 kB (growth 0 kB, threshold < 30 MB)
The patched fix curve trends below the leak curve.
Rollback: single git revert <this-sha> reverts both file edits.
Refs #2233.
The full rebuild path scans SESSION_DIR via glob('*.json') and appends every loaded session to a plain list without deduplicating by session_id. When old-format session_*.json files coexist alongside WebUI-format xxx.json files (both sharing session_id), the index gets duplicate entries, causing frontend Vue key crashes.
Fix: use dict[session_id -> compact_entry] to naturally deduplicate.
The full rebuild path of _write_session_index scans SESSION_DIR via
glob('*.json') and appends every loaded session to a plain list without
deduplicating by session_id. When old-format session_*.json files coexist
alongside WebUI-format xxx.json files (both sharing the same session_id),
the same session appears multiple times in the index, causing frontend
Vue key collisions and a blank page.
Fix: use dict[session_id -> compact_entry] to naturally deduplicate.
Prefer the entry with the larger message_count when conflicts arise.
Per deep-review verdict SHIP-WITH-FIXES on PR #2636:
1. Profile-switch reconciliation: _refreshProfileSwitchBackground now re-fetches
/api/settings and re-applies hidden_tabs for the new profile. Without this,
Profile A's hidden-tabs choice stayed in effect under Profile B until the
user opened Settings → Appearance.
2. A11y: switched chips from role=button + aria-pressed to role=switch +
aria-checked. The pressed/not-pressed wording confused screen-reader users
because chip-off looks like the off state. Added role=group +
aria-labelledby on the container, and a :focus-visible style on the chips.
3. Server-side belt-and-suspenders: api/config.py now strips 'chat' and
'settings' from hidden_tabs at validation time, matching the client's apply-
time filter. A tampered POST can no longer persist the forbidden values.
3 new regression tests added (chat/settings rejection, profile-switch wiring,
chip a11y attributes).
Co-authored-by: FrancescoFarinola <francesco.farinola@example.com>
Custom providers that have a curated models: list in config.yaml
(e.g. ZenMux gateways) should show ONLY those configured models in
the picker dropdown, not the full /v1/models catalog.
Before this fix, _named_custom_groups unconditionally called
_read_custom_endpoint_models() which would pull hundreds of models
from aggregator gateways and overwrite the user's curated list.
Now the build checks if the custom_provider entry has a non-empty
models dict/list in config.yaml — if so, it skips the live fetch
and uses only the configured models (same behavior as hermes-agent
model_switch.py Section 4 patch).
Closes: configure-model-list-should-be-authoritative
The WebUI clarification popup had a response-delivery failure: users
submitted answers in the popup, but the agent still fell through to the
timeout fallback message. Three bugs conspired:
1. No stable clarify_id — _ClarifyEntry had no unique identifier, so
the frontend could not reference a specific pending prompt. The
backend used FIFO resolution which silently failed for stale/late
responses.
2. Frontend hid the card before confirmation — respondClarify() called
hideClarifyCard(true, 'sent') BEFORE the API call completed. If the
backend rejected the response, the card was already gone and the
user's draft was discarded.
3. Backend lied about success — _resolve_clarify_legacy() returned
bool(resolved) or not bool(clarify_id). Since the frontend never
sent clarify_id, the backend always reported ok:true even when
nothing was resolved.
Changes:
api/clarify.py:
- _ClarifyEntry now auto-generates a stable clarify_id (uuid4.hex[:12])
- submit_pending() injects clarify_id into the data dict visible to the
frontend via SSE and polling
- New resolve_clarify_by_id() for O(1) lookup by id instead of FIFO pop
api/routes.py:
- _resolve_clarify_legacy() uses resolve_clarify_by_id when clarify_id
is provided; returns actual bool result (no more unconditional True)
- _handle_clarify_respond() returns HTTP 409 + {ok:false, stale:true}
when resolution fails
static/messages.js:
- respondClarify() now sends clarify_id in the POST body
- Waits for a positive backend acknowledgement before hiding the card
- Saves a draft copy before POST and restores it on failure
- On 409/network error: re-enables controls, shows error toast
- Guards against parallel-SSE race where clearing the cache after a
successful response could erase a newly queued next prompt (codex P1)
tests:
- Updated test_sprint30.py for new ack-before-hide behaviour
- Updated test_clarify_unblock.py for 409 on stale responses
Closes#2639.