Merge pull request #1534 from nesquena/stage-279

v0.50.279 — 8-PR batch from full PR sweep + Opus MUST-FIX caught
2026-05-25 11:10:18 +00:00 · 2026-05-03 09:26:03 -07:00
parent 9e31a2ac65 11cc493806
commit f8ed6dac05
16 changed files with 287 additions and 25 deletions
@@ -1,5 +1,34 @@
 # Hermes Web UI -- Changelog

+## [v0.50.279] — 2026-05-03
+
+### Fixed (8-PR batch from full PR sweep — closes #1463, #1491, #1503, #1509, #1522)
+
+- **Branch indicator codepoint corrected** (#1523, @franksong2702; closes #1522) — the fork-indicator glyph in the sidebar was rendering `⒂ PARENTHESIZED DIGIT FIFTEEN` (`\u2482`) instead of the intended `⑂ OCR FORK` (`\u2442`). Forked sessions appeared with a mysterious "(15)" prefix that looked like a message count or unread badge — users would click expecting something related to "15" and find nothing. The actual fork indicator was invisible. One-character fix in `static/sessions.js:1657` plus the matching test assertion update.
+
+- **Onboarding API-key field stops losing focus during probe** (#1519, @franksong2702; closes #1503) — the wizard's API-key input had `oninput="_scheduleOnboardingProbe()"` firing a 400ms-debounced probe on every keystroke. When the probe completed, `_renderOnboardingBody()` rebuilt the entire form DOM, destroying the `<input>` element the user was typing into. On localhost the probe completes in ~5-50ms so the bug window was narrow; on slow networks (VPN, corporate proxy, cold-start vLLM) the re-render routinely landed between keystrokes. Especially painful on the password field where users paste long secrets. **Fix:** removed `_scheduleOnboardingProbe()` from the api-key input's `oninput` handler in `static/onboarding.js:200`; added `onblur="_runOnboardingProbe()"` so the probe still fires when the user tabs away. The probe also still fires via the "Test connection" button and `nextOnboardingStep()` before Continue — no flow breakage.
+
+- **Voice-mode pref toggle-off now stops the recognizer** (#1518, @franksong2702; closes #1491) — if a user enabled the hands-free voice mode (PR #1489, v0.50.271), started a conversation, then opened Settings → Preferences and disabled the pref, the button disappeared but the SpeechRecognition kept running. The user had no way to stop it short of reloading the page — and it was consuming microphone access + battery the whole time. **Fix:** `_applyVoiceModePref()` in `static/boot.js` now reads the pref into a local `enabled` variable and calls `_deactivate()` (the standard cleanup path that stops recognition, clears timers, restores TTS, resets UI state) when `!enabled && _voiceModeActive`. Plus a TDZ-safety hoist: `let _voiceModeActive = false` moved above `_applyVoiceModePref()` (was previously declared after the function — Temporal Dead Zone risk if the function were ever called before init).
+
+- **YAML code blocks render with newlines** (#1516, @franksong2702; closes #1463) — Prism's YAML grammar wraps tokens in `<span class="token …">` elements where `white-space` defaults to `normal`, collapsing `\n` characters into spaces even when the underlying `textContent` preserved them. Plain code blocks and `language-bash` rendered correctly; only `language-yaml` was affected. YAML is one of the most common LLM output formats (config files, docker-compose, CI pipelines, Kubernetes manifests) — flattened YAML in chat is unreadable. **Fix:** two CSS rules in `static/style.css` forcing `white-space: pre !important` on `.msg-body pre code.language-yaml .token` and `.preview-md pre code.language-yaml .token`. Scoped tightly to YAML — no impact on other languages. Verified via the reporter's two diagnostic probes (`textContent` had `\n`, only `language-yaml` was affected) that the renderer pipeline was correct and the fix needed to be at the CSS layer.
+
+- **Service-worker placeholder consolidation** (#1517, @franksong2702; closes #1509) — `__CACHE_VERSION__` (in `static/sw.js`) and `__WEBUI_VERSION__` (in `static/index.html`) were functionally identical: both substituted at request time via `quote(WEBUI_VERSION, safe="")`. Two names existed for historical reasons (different files added at different releases). Naming hygiene flagged by both the independent reviewer and the Opus advisor during the v0.50.276 release review. **Fix:** rename `__CACHE_VERSION__` → `__WEBUI_VERSION__` across `static/sw.js`, `api/routes.py`, `tests/test_pwa_manifest_sw.py`. Pure rename, no behavior change — same `?v=vX.Y.Z` query strings on the same URLs at the wire.
+
+- **WebUI-origin state.db sessions recoverable when JSON sidecar missing** (#1532, @ai-ag2026; refs #1471) — when a WebUI-origin session existed in `state.db.sessions` / `state.db.messages` but the matching `~/.hermes/webui/sessions/<id>.json` sidecar was missing (possible after disk-write failures, partial restore, or interrupted writes), the session was invisible to `/api/sessions` even though the canonical SQLite messages were intact. Root cause: `read_importable_agent_session_rows()` had a hard-coded `s.source != 'webui'` predicate that re-applied the filter even when callers opted out via `exclude_sources=None`. Slice 1 of the #1471 session-recovery class. **Fix:** `api/agent_sessions.py` makes the default exclusion explicit (`("cron", "webui")`) and removes the hard-coded predicate so `exclude_sources=None` actually includes WebUI-origin rows. New regression test `test_webui_state_db_session_without_sidecar_appears_when_agent_sessions_enabled`.
+
+- **Stale runtime stream state cleared proactively** (#1525, @ai-ag2026; refs #1471) — session JSON could retain `active_stream_id` plus paired pending fields (`pending_user_message`, `pending_attachments`, `pending_started_at`) after a stream failure, provider exception, or server restart. `/health` would correctly report `active_streams: 0`, but `/sessions/<id>` would still claim `agent_running` (pure truthiness on `s.active_stream_id`) and the frontend's `INFLIGHT[sid]` would keep the UI busy on a dead stream. Slice 2 of the #1471 session-recovery class, distinct from #1532's "session in DB but no sidecar" path. **Fix:** new `_clear_stale_stream_state()` helper in `api/streaming.py` runs proactively at the read boundary (`/sessions/<id>` GET) and before new turns start. Verifies the stream is actually missing from `STREAMS` (the in-memory registry) before clearing — never expires live streams by age. Frontend half: `static/sessions.js` clears `INFLIGHT[sid]` when the server reports no `active_stream_id`. **Maintainer merge-conflict resolution:** kept the rename-side `CACHE_NAME = 'hermes-shell-__WEBUI_VERSION__'` (post-#1517 rename) over the PR's manual `-stale-stream-cleanup1` suffix. The renamed placeholder still auto-bumps with each release through `quote(WEBUI_VERSION, safe="")`, so the manual suffix was redundant — natural version bump (v0.50.278 → v0.50.279) already invalidates the old cache via `caches.delete(k)` for `k !== CACHE_NAME` in the SW activate handler. 5 new regression tests in `test_stale_stream_cleanup.py`.
+
+- **WebUI max_tokens forwarded to agent + OpenRouter quota classifier** (#1526, @ai-ag2026; refs #1524) — WebUI agent initialization didn't pass the configured `max_tokens` to `AIAgent`, so provider-native output ceilings could be requested. On OpenRouter this could fail with quota-style HTTP 402 messages like `more credits`, `can only afford`, `fewer max_tokens`. Pre-fix, those phrases weren't classified as quota failures and didn't trigger the fallback chain — users saw raw 402 errors instead of automatic fallback to a less-expensive model. **Fix:** `api/streaming.py` reads configured `max_tokens` from top-level + `agent.max_tokens` fallback, parses positive integers, includes both `max_tokens` and the fallback state in the `SESSION_AGENT_CACHE` signature (so config changes don't reuse a stale cached agent), and passes `max_tokens` to `AIAgent` only when the constructor supports it (uses `inspect.signature(AIAgent.__init__)` rather than a try/except that would swallow real `TypeError`s). Quota classifier additions for the three OpenRouter phrases route to the same fallback chain as existing quota markers. New regression tests in `test_streaming_max_tokens_quota.py`.
+
+### Notes
+
+- 3936 → **3946** tests passing (+9 from constituent PRs + 1 conflict-marker regression guard added in-release per Opus MUST-FIX).
+- Pre-release Opus advisor pass: **caught a MUST-FIX (sw.js merge-conflict markers still in tree despite earlier `git add`/`commit`)** that would have shipped a broken service worker. Resolution applied in stage and a `test_sw_js_has_no_merge_conflict_markers` regression guard added so this can't happen silently again. One SHOULD-FIX (race in `_clear_stale_stream_state` between registry-check and session-mutate) explicitly deferred to follow-up #1533 per Opus's "fine to defer given the narrow window" advice — bounded effect (orphaned stream requires retry, no data corruption).
+- One merge conflict resolved during stage build (#1525 vs #1517 cache-name placeholder collision); resolution drops PR #1525's manual `-stale-stream-cleanup1` suffix in favor of the canonical `__WEBUI_VERSION__` token (natural release-bump preserves the cache-invalidation guarantee).
+- 2 PRs closed as duplicates during sweep: #1528 (identical to #1517) and #1529 (superseded by #1516, `.preview-md` coverage missing).
+- 5 PRs stay on hold: #1418 (hard prereq hermes-agent#18534 not yet merged), #1464 (blocker — `noResults` ternary inverted, awaiting JKJameson fix), #1404 (UX — aronprins width feedback unresolved), #1353 (already `ready-for-review` tagged, durability path needs independent review), #1311 (draft + CONFLICTING).
+- 1 PR routed to maintainer-review: #1531 (Asunfly stowaway change in force-push to title aux generation that wasn't in PR description; awaiting scope decision).
+
 ## [v0.50.278] — 2026-05-03

 ### Added (1 PR — splices best of #1497 + #1513)
@@ -3,7 +3,7 @@
 > Goal: Full 1:1 parity with the Hermes CLI experience via a clean dark web UI.
 > Everything you can do from the CLI terminal, you can do from this UI.
 >
-> Last updated: v0.50.278 (May 03, 2026) — 3936 tests collected
+> Last updated: v0.50.279 (May 03, 2026) — 3946 tests collected
 > Tests: `pytest tests/ --collect-only -q`
 > Source: <repo>/

@@ -1835,8 +1835,8 @@ Bridged CLI sessions:

 ---

-*Last updated: v0.50.278, May 03, 2026*
-*Total automated tests collected: 3936*
+*Last updated: v0.50.279, May 03, 2026*
+*Total automated tests collected: 3946*
 *Regression gate: tests/test_regressions.py*
 *Run: pytest tests/ -v --timeout=60*
 *Source: <repo>/*
@@ -214,9 +214,9 @@ def read_importable_agent_session_rows(
    db_path: Path,
    limit: int = 200,
    log=None,
-    exclude_sources: tuple[str, ...] | None = ("cron",),
+    exclude_sources: tuple[str, ...] | None = ("cron", "webui"),
 ) -> list[dict]:
-    """Return non-WebUI agent sessions projected as importable conversations.
+    """Return agent sessions projected as importable conversations.

    Hermes Agent can create rows in ``state.db.sessions`` before a session has
    any messages, and long conversations can be split into compression-linked
@@ -256,7 +256,7 @@ def read_importable_agent_session_rows(
        ended_expr = _optional_col('ended_at', session_cols)
        end_reason_expr = _optional_col('end_reason', session_cols)

-        where_clauses = ["s.source IS NOT NULL", "s.source != 'webui'"]
+        where_clauses = ["s.source IS NOT NULL"]
        params: list[str] = []
        if exclude_sources:
            excluded = tuple(str(source) for source in exclude_sources if source)
@@ -233,6 +233,34 @@ from api.helpers import (
    _redact_text,
 )

+
+def _clear_stale_stream_state(session) -> bool:
+    """Clear persisted streaming flags when the in-memory stream no longer exists.
+
+    A server restart or worker crash can leave active_stream_id/pending_* in the
+    session JSON while STREAMS is empty. The frontend then keeps reconnecting to
+    a dead stream and shows a permanent running/thinking state.
+    """
+    stream_id = getattr(session, "active_stream_id", None)
+    if not stream_id:
+        return False
+    with STREAMS_LOCK:
+        stream_alive = stream_id in STREAMS
+    if stream_alive:
+        return False
+    session.active_stream_id = None
+    if hasattr(session, "pending_user_message"):
+        session.pending_user_message = None
+    if hasattr(session, "pending_attachments"):
+        session.pending_attachments = []
+    if hasattr(session, "pending_started_at"):
+        session.pending_started_at = None
+    try:
+        session.save()
+    except Exception:
+        pass
+    return True
+
 # ── CSRF: validate Origin/Referer on POST ────────────────────────────────────
 import re as _re

@@ -1188,7 +1216,7 @@ def handle_get(handler, parsed) -> bool:
            from api.updates import WEBUI_VERSION
            version_token = quote(WEBUI_VERSION, safe="")
            text = sw_path.read_text(encoding="utf-8").replace(
-                "__CACHE_VERSION__", version_token
+                "__WEBUI_VERSION__", version_token
            )
            data = text.encode("utf-8")
            handler.send_response(200)
@@ -1309,6 +1337,7 @@ def handle_get(handler, parsed) -> bool:
        try:
            _t1 = _time.monotonic()
            s = get_session(sid, metadata_only=(not load_messages))
+            _clear_stale_stream_state(s)
            _t2 = _time.monotonic()
            effective_model = (
                _resolve_effective_session_model_for_display(s)
@@ -1435,6 +1464,7 @@ def handle_get(handler, parsed) -> bool:
            return bad(handler, "Missing session_id")
        try:
            from api.session_ops import session_status
+            _clear_stale_stream_state(get_session(sid, metadata_only=True))
            return j(handler, session_status(sid))
        except KeyError:
            return bad(handler, "Session not found", 404)
@@ -4265,7 +4295,7 @@ def _handle_chat_start(handler, body):
                status=409,
            )
        # Stale stream id from a previous run; clear and continue.
-        s.active_stream_id = None
+        _clear_stale_stream_state(s)
    stream_id = uuid.uuid4().hex
    with _get_session_agent_lock(s.session_id):
        s.workspace = workspace
@@ -1792,6 +1792,25 @@ def _run_agent_streaming(
            import inspect as _inspect
            _agent_params = set(_inspect.signature(_AIAgent.__init__).parameters)

+            # CLI-parity max output cap: read config.yaml's max_tokens and pass
+            # it to AIAgent when supported. Without this WebUI-created agents use
+            # provider-native output ceilings (e.g. Claude via OpenRouter can
+            # request 64k), which may turn an otherwise usable fallback into a
+            # 402 "more credits / fewer max_tokens" failure.
+            _max_tokens_cfg = None
+            try:
+                _raw_max_tokens = _cfg.get('max_tokens')
+                if _raw_max_tokens is None:
+                    _agent_cfg_for_tokens = _cfg.get('agent', {})
+                    if isinstance(_agent_cfg_for_tokens, dict):
+                        _raw_max_tokens = _agent_cfg_for_tokens.get('max_tokens')
+                if _raw_max_tokens is not None:
+                    _parsed_max_tokens = int(_raw_max_tokens)
+                    if _parsed_max_tokens > 0:
+                        _max_tokens_cfg = _parsed_max_tokens
+            except Exception:
+                _max_tokens_cfg = None
+
            # CLI-parity reasoning effort: read agent.reasoning_effort from the
            # active profile's config.yaml (the same key the CLI writes via
            # `/reasoning <level>`) and hand the parsed dict to AIAgent.  When
@@ -1830,6 +1849,8 @@ def _run_agent_streaming(
            # but guard defensively to avoid TypeError on an older agent build.
            if 'reasoning_config' in _agent_params and _reasoning_config is not None:
                _agent_kwargs['reasoning_config'] = _reasoning_config
+            if 'max_tokens' in _agent_params and _max_tokens_cfg is not None:
+                _agent_kwargs['max_tokens'] = _max_tokens_cfg
            # Params added in newer hermes-agent — skip if not supported
            if 'api_mode' in _agent_params:
                _agent_kwargs['api_mode'] = _rt.get('api_mode')
@@ -1861,6 +1882,8 @@ def _run_agent_streaming(
                    _hashlib.sha256((resolved_api_key or '').encode()).hexdigest()[:16],
                    resolved_base_url or '',
                    resolved_provider or '',
+                    _max_tokens_cfg or '',
+                    _fallback_resolved or {},
                    sorted(_toolsets) if _toolsets else [],
                ], sort_keys=True)
                _agent_sig = _hashlib.sha256(_sig_blob.encode()).hexdigest()[:16]
@@ -2098,6 +2121,9 @@ def _run_agent_streaming(
                        'insufficient credit' in _err_lower
                        or 'credit balance' in _err_lower
                        or 'credits exhausted' in _err_lower
+                        or 'more credits' in _err_lower
+                        or 'can only afford' in _err_lower
+                        or 'fewer max_tokens' in _err_lower
                        or 'quota_exceeded' in _err_lower
                        or 'quota exceeded' in _err_lower
                        or 'exceeded your current quota' in _err_lower
@@ -2433,6 +2459,9 @@ def _run_agent_streaming(
            'insufficient credit' in _exc_lower
            or 'credit balance' in _exc_lower
            or 'credits exhausted' in _exc_lower
+            or 'more credits' in _exc_lower
+            or 'can only afford' in _exc_lower
+            or 'fewer max_tokens' in _exc_lower
            or 'quota_exceeded' in _exc_lower
            or 'quota exceeded' in _exc_lower
            or 'exceeded your current quota' in _exc_lower
@@ -470,14 +470,17 @@ window._micPendingSend=window._micPendingSend||false;
    try{ return localStorage.getItem('hermes-voice-mode-button')==='true'; }
    catch(_){ return false; }
  }
+  let _voiceModeActive=false;
+
  function _applyVoiceModePref(){
-    modeBtn.style.display = _voiceModePrefEnabled() ? '' : 'none';
+    const enabled = _voiceModePrefEnabled();
+    modeBtn.style.display = enabled ? '' : 'none';
+    if(!enabled && _voiceModeActive) _deactivate();
  }
  _applyVoiceModePref();
  // Expose so the settings pane can re-apply immediately on toggle.
  window._applyVoiceModePref = _applyVoiceModePref;

-  let _voiceModeActive=false;
  let _voiceModeState='idle'; // idle | listening | thinking | speaking
  let _recognition=null;
  let _silenceTimer=null;
@@ -197,7 +197,7 @@ function _renderOnboardingApiKeyField(){
  const labelKey=keyOptional?'onboarding_api_key_label_optional':'onboarding_api_key_label';
  const placeholderKey=keyOptional?'onboarding_api_key_placeholder_optional':'onboarding_api_key_placeholder';
  const helpHtml=keyOptional?`<p class="onboarding-copy onboarding-api-key-help">${esc(t('onboarding_api_key_help_keyless')||'')}</p>`:'';
-  return `<label class="onboarding-field" id="onboardingApiKeyField"><span>${t(labelKey)}</span><input id="onboardingApiKeyInput" type="password" value="${esc(ONBOARDING.form.apiKey||'')}" placeholder="${t(placeholderKey)}" oninput="ONBOARDING.form.apiKey=this.value;_scheduleOnboardingProbe()"></label>${helpHtml}`;
+  return `<label class="onboarding-field" id="onboardingApiKeyField"><span>${t(labelKey)}</span><input id="onboardingApiKeyInput" type="password" value="${esc(ONBOARDING.form.apiKey||'')}" placeholder="${t(placeholderKey)}" oninput="ONBOARDING.form.apiKey=this.value" onblur="_runOnboardingProbe()"></label>${helpHtml}`;
 }

 function _getOnboardingSelectedModel(){
@@ -387,6 +387,15 @@ async function loadSession(sid){
  _setActiveSessionUrl(S.session.session_id);

  const activeStreamId=S.session.active_stream_id||null;
+  // If the server says the session is idle, discard any browser-side inflight
+  // cache left behind by a crashed/restarted stream. Otherwise the UI can keep
+  // showing a permanent thinking/running state even though active_streams=0.
+  if(!activeStreamId&&INFLIGHT[sid]){
+    delete INFLIGHT[sid];
+    if(typeof clearInflightState==='function') clearInflightState(sid);
+    S.activeStreamId=null;
+    S.busy=false;
+  }

  // Phase 2a: If session is streaming, restore from INFLIGHT cache before
  // loading full messages (INFLIGHT state is self-contained and sufficient).
@@ -1654,7 +1663,7 @@ function renderSessionListFromCache(){
    if(s.parent_session_id){
      const branchInd=document.createElement('span');
      branchInd.className='session-branch-indicator';
-      branchInd.textContent='\u2482'; // ⑂
+      branchInd.textContent='\u2442'; // ⑂
      branchInd.title=(typeof t==='function'?t('forked_from'):'Forked from')+' '+s.parent_session_id;
      branchInd.style.cursor='pointer';
      branchInd.onclick=(e)=>{
@@ -735,6 +735,8 @@
  .msg-body pre code{background:none;padding:0;border-radius:0;color:var(--pre-text);font-size:13px;line-height:1.6;}
  /* Keep original theme background — prevent prism-tomorrow from overriding --code-bg */
  .msg-body pre[class*="language-"],.msg-body pre code[class*="language-"]{background:var(--code-bg) !important;}
+  /* Fix #1463: Prism YAML grammar collapses newlines inside token spans — force pre */
+  .msg-body pre code.language-yaml .token{white-space:pre !important;}
  .pre-header{font-size:10px;font-weight:600;text-transform:uppercase;letter-spacing:.06em;color:var(--muted);padding:8px 16px 8px;background:var(--input-bg);border-radius:10px 10px 0 0;border:1px solid var(--border);border-bottom:1px solid var(--border);display:flex;align-items:center;gap:6px;}
  .pre-header::before{content:'';width:8px;height:8px;border-radius:50%;background:var(--muted);opacity:.4;}
  .pre-header+pre{border-radius:0 0 10px 10px;border-top:none;margin-top:0;}
@@ -1128,6 +1130,8 @@
  .preview-md pre code{background:none;padding:0;color:var(--pre-text);font-size:11.5px;line-height:1.55;}
  /* Keep original theme background — prevent prism-tomorrow from overriding --code-bg */
  .preview-md pre[class*="language-"],.preview-md pre code[class*="language-"]{background:var(--code-bg) !important;}
+  /* Fix #1463: Prism YAML grammar collapses newlines inside token spans — force pre */
+  .preview-md pre code.language-yaml .token{white-space:pre !important;}
  .preview-md blockquote{border-left:3px solid var(--blue);padding-left:12px;color:var(--muted);font-style:italic;margin:8px 0;}
  .preview-md blockquote p{margin:0;}
  .preview-md strong{color:var(--strong);font-weight:600;}.preview-md em{color:var(--em);}
@@ -7,18 +7,18 @@

 // Cache version is injected by the server at request time (routes.py /sw.js handler).
 // Bumps automatically whenever the git commit changes — no manual edits needed.
-const CACHE_NAME = 'hermes-shell-__CACHE_VERSION__';
+const CACHE_NAME = 'hermes-shell-__WEBUI_VERSION__';

 // Static assets that form the app shell.
 //
-// Versioned assets (CSS + JS) include `?v=__CACHE_VERSION__` to match the
+// Versioned assets (CSS + JS) include `?v=__WEBUI_VERSION__` to match the
 // query string the page sends — see index.html. Without the version query
 // here, every cache lookup against `?v=...` URLs would miss and fall through
 // to network, defeating the pre-cache.
 //
 // Unversioned assets (`./`, manifest.json, favicons) are referenced from
 // index.html without a cache-bust query, so they stay unversioned here too.
-const VQ = '?v=__CACHE_VERSION__';
+const VQ = '?v=__WEBUI_VERSION__';
 const SHELL_ASSETS = [
  './',
  './static/style.css' + VQ,
@@ -225,7 +225,7 @@ def test_sidebar_parent_indicator():
        "sessions.js should check parent_session_id"
    assert 'session-branch-indicator' in src, \
        "Should have session-branch-indicator class"
-    assert '\\u2482' in src, \
+    assert '\\u2442' in src, \
        "Should use ⑂ character for parent indicator"


@@ -208,6 +208,41 @@ def test_gateway_sessions_appear_when_enabled():
        post('/api/settings', {'show_cli_sessions': False})


+def test_webui_state_db_session_without_sidecar_appears_when_agent_sessions_enabled():
+    """Regression: WebUI-origin rows in state.db can recover missing JSON sidecars."""
+    conn = _ensure_state_db()
+    sid = 'webui_state_only_001'
+    try:
+        _insert_agent_session_row(
+            conn,
+            session_id=sid,
+            source='webui',
+            title='Recovered WebUI Session',
+            model='openai/gpt-5',
+            messages=2,
+        )
+
+        post('/api/settings', {'show_cli_sessions': True})
+
+        data, status = get('/api/sessions')
+        assert status == 200
+        sessions = data.get('sessions', [])
+        recovered = [s for s in sessions if s.get('session_id') == sid]
+        assert len(recovered) == 1, (
+            "WebUI-origin sessions that exist in state.db but have no JSON sidecar "
+            "should be surfaced through the agent-session bridge for recovery."
+        )
+        assert recovered[0].get('source_tag') == 'webui'
+        assert recovered[0].get('is_cli_session') is True
+    finally:
+        try:
+            _remove_test_sessions(conn, sid)
+            conn.close()
+        except Exception:
+            pass
+        post('/api/settings', {'show_cli_sessions': False})
+
+
 def test_gateway_sessions_without_messages_are_hidden_from_sidebar():
    """Regression: empty agent session rows must not appear as broken sidebar entries."""
    conn = _ensure_state_db()
@@ -2,7 +2,7 @@

 Covers:
 - manifest.json is valid JSON with required PWA fields
- sw.js has the `__CACHE_VERSION__` placeholder the server replaces at request time
+- sw.js has the `__WEBUI_VERSION__` placeholder the server replaces at request time
 - sw.js offline-fallback uses a resolved promise (not `caches.match() || fallback`
  which is broken — Promise objects are always truthy in `||` checks, so the
  fallback Response would never be used)
@@ -52,11 +52,30 @@ class TestManifest:
 class TestServiceWorker:
    def test_sw_has_cache_version_placeholder(self):
        src = SW.read_text(encoding="utf-8")
-        assert "__CACHE_VERSION__" in src, (
-            "sw.js must contain __CACHE_VERSION__ placeholder for the server "
+        assert "__WEBUI_VERSION__" in src, (
+            "sw.js must contain __WEBUI_VERSION__ placeholder for the server "
            "handler at /sw.js to replace with WEBUI_VERSION at request time"
        )

+    def test_sw_js_has_no_merge_conflict_markers(self):
+        """Regression guard for v0.50.279 stage build: a leftover git conflict
+        marker in static/sw.js made the file fail to parse as JavaScript even
+        though the substring-based source-string tests still passed (the
+        ``__WEBUI_VERSION__`` token was present, just inside the conflict block).
+
+        A broken sw.js means the install handler throws on script load → SW
+        never reaches activated state → old SW keeps controlling the page →
+        every "old SW deletes other caches" guarantee is forfeited and frontend
+        cache-bust pathways silently break. Caught by Opus advisor pre-merge,
+        ship blocked. This test would have caught it too.
+        """
+        src = SW.read_text(encoding="utf-8")
+        for marker in ("<<<<<<<", "=======\n", ">>>>>>>"):
+            assert marker not in src, (
+                f"static/sw.js contains conflict marker {marker!r}; "
+                "the merge resolution did not actually land. Reject ship."
+            )
+
    def test_sw_bypasses_api_and_stream(self):
        src = SW.read_text(encoding="utf-8")
        assert "/api/" in src, "SW must bypass /api/* (no cached auth/session responses)"
@@ -117,8 +136,8 @@ class TestPWARoutes:
        idx = src.find('"/sw.js"')
        assert idx != -1, "routes.py must handle /sw.js"
        block = src[idx:idx + 1000]
-        assert "__CACHE_VERSION__" in block, (
-            "sw.js route must replace __CACHE_VERSION__ with the current WEBUI_VERSION"
+        assert "__WEBUI_VERSION__" in block, (
+            "sw.js route must replace __WEBUI_VERSION__ with the current WEBUI_VERSION"
        )
        assert "WEBUI_VERSION" in block, (
            "sw.js route must import and use WEBUI_VERSION for cache busting"
@@ -185,7 +204,7 @@ class TestIndexHtmlIntegration:

    def test_sw_shell_assets_match_versioned_asset_urls(self):
        """The service worker's SHELL_ASSETS pre-cache list must use the same
-        `?v=__CACHE_VERSION__` suffix on JS+CSS that index.html sends, so that
+        `?v=__WEBUI_VERSION__` suffix on JS+CSS that index.html sends, so that
        the pre-cached entries actually serve when the page requests them.

        Without this, every `cache.match()` for a versioned asset URL (e.g.
@@ -208,13 +227,13 @@ class TestIndexHtmlIntegration:
            "terminal.js",
            "onboarding.js",
        ):
-            # Either inline `?v=__CACHE_VERSION__` or via the VQ constant
+            # Either inline `?v=__WEBUI_VERSION__` or via the VQ constant
            # produces a URL string the cache lookup can match.
-            has_inline = f"{asset}?v=__CACHE_VERSION__" in src
+            has_inline = f"{asset}?v=__WEBUI_VERSION__" in src
            has_concat = f"{asset}' + VQ" in src or f"{asset}\" + VQ" in src
            assert has_inline or has_concat, (
                f"sw.js SHELL_ASSETS entry for {asset} must carry "
-                "?v=__CACHE_VERSION__ to match the URL the page requests"
+                "?v=__WEBUI_VERSION__ to match the URL the page requests"
            )

    def test_index_route_url_encodes_asset_version(self):
@@ -0,0 +1,65 @@
+from pathlib import Path
+
+REPO = Path(__file__).resolve().parents[1]
+ROUTES_SRC = (REPO / "api" / "routes.py").read_text(encoding="utf-8")
+SESSIONS_SRC = (REPO / "static" / "sessions.js").read_text(encoding="utf-8")
+SW_SRC = (REPO / "static" / "sw.js").read_text(encoding="utf-8")
+
+
+def test_stale_stream_cleanup_helper_exists():
+    assert "def _clear_stale_stream_state(session)" in ROUTES_SRC
+    assert "stream_id in STREAMS" in ROUTES_SRC
+    assert "session.active_stream_id = None" in ROUTES_SRC
+    assert "session.pending_user_message = None" in ROUTES_SRC
+    assert "session.pending_attachments = []" in ROUTES_SRC
+    assert "session.pending_started_at = None" in ROUTES_SRC
+    assert "session.save()" in ROUTES_SRC
+
+
+def test_session_load_clears_stale_stream_before_response():
+    load_pos = ROUTES_SRC.index("s = get_session(sid, metadata_only=(not load_messages))")
+    cleanup_pos = ROUTES_SRC.index("_clear_stale_stream_state(s)", load_pos)
+    response_pos = ROUTES_SRC.index('"active_stream_id": getattr(s, "active_stream_id", None)', cleanup_pos)
+    assert load_pos < cleanup_pos < response_pos
+
+
+def test_chat_start_clears_stale_pending_state_not_only_active_id():
+    stale_comment_pos = ROUTES_SRC.index("# Stale stream id from a previous run; clear and continue.")
+    cleanup_pos = ROUTES_SRC.index("_clear_stale_stream_state(s)", stale_comment_pos)
+    stream_id_pos = ROUTES_SRC.index("stream_id = uuid.uuid4().hex", cleanup_pos)
+    assert stale_comment_pos < cleanup_pos < stream_id_pos
+
+
+def test_frontend_drops_inflight_cache_when_server_session_is_idle():
+    marker = "If the server says the session is idle, discard any browser-side inflight"
+    marker_pos = SESSIONS_SRC.index(marker)
+    window = SESSIONS_SRC[marker_pos:marker_pos + 500]
+    assert "if(!activeStreamId&&INFLIGHT[sid])" in window
+    assert "delete INFLIGHT[sid]" in window
+    assert "clearInflightState" in window
+    assert "S.busy=false" in window
+
+
+def test_service_worker_cache_bumped_for_frontend_fix_delivery():
+    """The SW CACHE_NAME must be keyed on the WEBUI_VERSION placeholder so
+    every release naturally invalidates the previous shell cache and delivers
+    the frontend half of the stale-stream cleanup fix to existing browsers.
+
+    Originally pinned a manual `-stale-stream-cleanup1` suffix on
+    `CACHE_NAME` (PR #1525 author shipped that to force-bump existing
+    SWs). During the v0.50.279 stage build that suffix collided with the
+    independent #1517 placeholder rename (`__CACHE_VERSION__` →
+    `__WEBUI_VERSION__`), so the maintainer dropped the manual suffix in
+    favor of the canonical version-token path. The natural bump still
+    invalidates the old cache via `keys.filter((k) => k !== CACHE_NAME)`
+    in the activate handler — same delivery guarantee, less churn.
+    """
+    # CACHE_NAME must include the WEBUI_VERSION placeholder so each release
+    # produces a different cache name. The activate handler then deletes any
+    # cache whose key != current CACHE_NAME, so the old shell is reaped on
+    # every upgrade and the new sessions.js (with the INFLIGHT[sid] clear)
+    # ships to existing browsers.
+    assert "CACHE_NAME = 'hermes-shell-__WEBUI_VERSION__'" in SW_SRC, (
+        "SW CACHE_NAME must include __WEBUI_VERSION__ so each release "
+        "invalidates the previous cache and delivers frontend changes."
+    )
@@ -0,0 +1,39 @@
+"""Regression coverage for WebUI streaming provider failure handling.
+
+The incident this guards against: WebUI-created AIAgent instances did not pass
+config.yaml's max_tokens, so a fallback Claude model via OpenRouter requested its
+native 64k output ceiling and failed with HTTP 402 "more credits / fewer
+max_tokens". The stream then looked like a stuck Thinking card instead of a
+clear quota error.
+"""
+from pathlib import Path
+
+
+STREAMING = Path(__file__).resolve().parents[1] / "api" / "streaming.py"
+
+
+def _src() -> str:
+    return STREAMING.read_text(encoding="utf-8")
+
+
+def test_streaming_passes_configured_max_tokens_to_agent():
+    src = _src()
+    assert "_raw_max_tokens = _cfg.get('max_tokens')" in src
+    assert "_agent_cfg_for_tokens.get('max_tokens')" in src
+    assert "_agent_kwargs['max_tokens'] = _max_tokens_cfg" in src
+
+
+def test_streaming_agent_cache_signature_includes_max_tokens_and_fallback():
+    src = _src()
+    assert "_max_tokens_cfg or ''" in src
+    assert "_fallback_resolved or {}" in src
+
+
+def test_openrouter_more_credits_error_is_classified_as_quota():
+    src = _src()
+    assert "'more credits' in _err_lower" in src
+    assert "'can only afford' in _err_lower" in src
+    assert "'fewer max_tokens' in _err_lower" in src
+    assert "'more credits' in _exc_lower" in src
+    assert "'can only afford' in _exc_lower" in src
+    assert "'fewer max_tokens' in _exc_lower" in src