Merge pull request #2400 from nesquena/stage-369

v0.51.76 — Release AZ (stage-369: 4-PR safe-lane batch)
2026-05-25 19:20:16 +00:00 · 2026-05-16 13:16:43 -07:00
parent 3de4338610 069503f0bf
commit 12b0b6dab3
17 changed files with 1477 additions and 73 deletions
@@ -2,6 +2,20 @@

 ## [Unreleased]

+## [v0.51.76] — 2026-05-16 — Release AZ (stage-369 — 4-PR safe-lane batch — live timeline preservation + OpenRouter cost history + chat stream cap + credential pool cache)
+
+### Added
+
+- **PR #2195** by @Michaelyklam (refs #692) — OpenRouter cost history backend. New `GET /api/providers/openrouter/cost_history` endpoint backed by daily snapshots from OpenRouter's `/auth/key` cumulative spend. Process-local lock around the snapshot read-modify-write critical section so concurrent dashboard refreshes or multiple tabs cannot overwrite newer reads with stale ones. Delta computation handles cumulative-counter resets (key rotation, OpenRouter-side reset) by starting a fresh series and using the current value as that day's delta rather than emitting negative spend. Backend-only slice; the 7-day daily cost chart UI is a separate follow-up.
+
+### Fixed
+
+- **PR #2347** by @franksong2702 (fixes #2344) — Preserve live agent timeline across session switches. Previously, switching away from an active stream and returning rebuilt the turn from the persisted `INFLIGHT` tail, which is enough to reconnect the stream but is not a full-fidelity DOM timeline — Thinking/tool grouping flattened, interim assistant text moved away from its surrounding context, auto-compression cards could project twice. The restore path now snapshots the live assistant turn DOM during the active stream and, on return, loads the persisted transcript first then merges the live snapshot back in so the on-screen scene is preserved as the user left it. Stamping `row.dataset.sessionId` at turn creation prevents the new live-turn sites from re-triggering the lossy rebuild path.
+
+- **PR #2393** by @Michaelyklam (refs #2313) — Cap live chat stream transports to the selected conversation. Previously, keeping many sessions open accumulated one long-lived `/api/chat/stream` EventSource per session. New `closeOtherLiveStreams(activeSid)` helper in `static/messages.js`; `attachLiveStream()` now reuses an existing same-session transport first, closes other sessions' chat SSE transports, then opens or replaces the selected session's stream. Background sessions still reattach normally when the user selects them — only the SSE transport is pruned, not the server-side stream ownership. New regression test pins the ordering (reuse first, prune background streams next, replace active transport last).
+
+- **PR #2396** by @starship-s — Preserve session agents for credential pools. The per-session `AIAgent` cache signature previously mixed stable agent identity with the volatile resolved API key, so credential-pool providers (where each request can resolve a different runtime token even when provider/model config is unchanged) missed the cache every turn and rebuilt the agent — losing warmed cross-turn state such as memory-provider prefetch results for providers like Hindsight. New credential-aware cache-signature helper uses a stable sentinel for credential-pool routes while preserving hashed API-key identity for non-pool routes; reused cached agents refresh runtime credentials in place; `AIAgent._primary_runtime` stays aligned after refresh so fallback/transport recovery cannot resurrect an old token; agents still in fallback-active state rebuild rather than mutate to avoid mixed primary/fallback runtime state. Static non-pool API keys still participate in the cache signature so explicit credential changes continue to invalidate.
+
 ## [v0.51.75] — 2026-05-16 — Release AY (stage-368 — 11-PR safe-lane batch — storage + i18n + run-journal parity + attachments + compression sidebar + restart-recovery + text-mode images + tables + settings i18n + German labels)

 ### Test infrastructure
@@ -92,6 +106,8 @@

 ### Added

+- **PR #2347** by @franksong2702 — Long tool-heavy streaming turns now preserve the live Thinking / assistant progress / Tool / Command timeline when the user switches away and back. The active stream keeps accumulating token and interim-assistant state while inactive, reloads the persisted transcript before merging the live tail, restores the live turn DOM snapshot instead of replaying tools into a flat list, and anchors automatic compression cards inside the active turn to avoid duplicate cards while an answer is still streaming.
+
 - **PR #2332** by @Michaelyklam (refs #2290) — Cron run history/output cards now surface token/cost metadata when the underlying cron output markdown includes it. The backend parses optional model/token/cost/duration frontmatter from cron output files and returns it from `/api/crons/history` and `/api/crons/run`; the Tasks panel renders a compact usage strip beside run rows and below expanded output without affecting older outputs that lack usage metadata.

 ### Fixed
@@ -1441,6 +1441,266 @@ def _provider_is_oauth(provider_id: str) -> bool:
    return provider_id in _OAUTH_PROVIDERS


+# ── OpenRouter cost-history snapshot helpers (#692) ──────────────────────────
+
+_COST_SNAPSHOTS_DIR_NAME = "cost-snapshots"
+_COST_SNAPSHOT_MAX_DAYS = 365  # hard cap to prevent unbounded growth
+_COST_SNAPSHOT_LOCK = threading.Lock()
+
+
+def _cost_snapshots_dir() -> Path:
+    """Return the directory for cost-snapshot JSON files.
+
+    Uses the Hermes home directory (profile-aware) so snapshots are
+    isolated per profile, matching the existing STATE_DIR convention.
+    """
+    return _get_hermes_home() / _COST_SNAPSHOTS_DIR_NAME
+
+
+def _fetch_openrouter_key_usage(api_key: str) -> dict[str, Any] | None:
+    """Fetch current usage/limit from the OpenRouter ``/auth/key`` endpoint.
+
+    Returns a dict with ``usage``, ``limit``, ``label`` on success, or
+    ``None`` on any failure.  Never raises; callers handle the None case.
+    """
+    req = urllib.request.Request(
+        _OPENROUTER_KEY_URL,
+        headers={
+            "Authorization": f"Bearer {api_key}",
+            "Accept": "application/json",
+        },
+    )
+    try:
+        with urllib.request.urlopen(req, timeout=_PROVIDER_QUOTA_TIMEOUT_SECONDS) as resp:
+            raw = resp.read()
+        payload = json.loads(raw.decode("utf-8") if isinstance(raw, (bytes, bytearray)) else raw)
+        sanitized = _sanitize_openrouter_quota(payload)
+        label = None
+        if isinstance(payload, dict):
+            data = payload.get("data", payload)
+            if isinstance(data, dict):
+                label = str(data.get("label") or "").strip() or None
+        return {
+            "usage": sanitized.get("usage"),
+            "limit": sanitized.get("limit"),
+            "label": label,
+        }
+    except Exception:
+        logger.debug("OpenRouter key usage fetch failed for cost-history", exc_info=True)
+        return None
+
+
+def _read_cost_snapshots(provider: str) -> list[dict[str, Any]]:
+    """Read persisted daily snapshots for *provider* from disk.
+
+    Returns a list of ``{date, used, limit}`` dicts sorted by date
+    ascending.  Returns an empty list if the file does not exist or is
+    corrupt.
+    """
+    path = _cost_snapshots_dir() / f"{provider}.json"
+    if not path.exists():
+        return []
+    try:
+        raw = path.read_text(encoding="utf-8")
+        data = json.loads(raw)
+    except (OSError, json.JSONDecodeError, ValueError):
+        return []
+    if not isinstance(data, dict):
+        return []
+    snapshots = data.get("snapshots")
+    if not isinstance(snapshots, list):
+        return []
+    # Validate and sort
+    valid = []
+    for entry in snapshots:
+        if not isinstance(entry, dict):
+            continue
+        date = str(entry.get("date") or "").strip()
+        if not date:
+            continue
+        valid.append({
+            "date": date,
+            "used": _quota_number(entry.get("used")),
+            "limit": _quota_number(entry.get("limit")),
+        })
+    valid.sort(key=lambda e: e["date"])
+    return valid
+
+
+def _write_cost_snapshots(provider: str, snapshots: list[dict[str, Any]]) -> None:
+    """Persist daily snapshots for *provider* to disk atomically."""
+    snap_dir = _cost_snapshots_dir()
+    snap_dir.mkdir(parents=True, exist_ok=True)
+    path = snap_dir / f"{provider}.json"
+    payload = {"provider": provider, "snapshots": snapshots}
+    body = json.dumps(payload, ensure_ascii=False, indent=2)
+    import tempfile as _tempfile
+    _tmp_fd, _tmp_path = _tempfile.mkstemp(
+        dir=str(snap_dir), prefix=f".{provider}_", suffix=".tmp"
+    )
+    try:
+        with os.fdopen(_tmp_fd, "w", encoding="utf-8") as _f:
+            _f.write(body)
+            _f.flush()
+            os.fsync(_f.fileno())
+        os.replace(_tmp_path, path)
+    except BaseException:
+        try:
+            os.unlink(_tmp_path)
+        except OSError:
+            pass
+        raise
+
+
+def _append_cost_snapshot(provider: str, usage: int | float | None, limit: int | float | None) -> list[dict[str, Any]]:
+    """Append today's snapshot and return the updated list.
+
+    If a snapshot for today already exists it is updated in-place so
+    repeated calls within the same day are idempotent.
+    """
+    today = datetime.now(timezone.utc).strftime("%Y-%m-%d")
+    # Serialize the read-modify-write cycle.  The atomic os.replace in
+    # _write_cost_snapshots protects the file write itself, but without this
+    # lock two concurrent requests can both read the same old snapshot list and
+    # race to replace it with stale data.
+    with _COST_SNAPSHOT_LOCK:
+        snapshots = _read_cost_snapshots(provider)
+        # Update or append today's entry
+        updated = False
+        for entry in snapshots:
+            if entry["date"] == today:
+                entry["used"] = usage
+                entry["limit"] = limit
+                updated = True
+                break
+        if not updated:
+            snapshots.append({"date": today, "used": usage, "limit": limit})
+        snapshots.sort(key=lambda e: e["date"])
+        # Cap to _COST_SNAPSHOT_MAX_DAYS entries (keep most recent)
+        if len(snapshots) > _COST_SNAPSHOT_MAX_DAYS:
+            snapshots = snapshots[-_COST_SNAPSHOT_MAX_DAYS:]
+        _write_cost_snapshots(provider, snapshots)
+        return snapshots
+
+
+def _compute_deltas(snapshots: list[dict[str, Any]], window_days: int) -> list[dict[str, Any]]:
+    """Compute daily deltas from cumulative usage snapshots.
+
+    Each snapshot carries cumulative ``used``; the delta for a day is
+    the difference between that day's cumulative value and the previous
+    day's.  The oldest day in the window has ``delta=None`` (no
+    previous baseline).  If the cumulative value drops, treat that day
+    as the start of a fresh series (for example after an API-key rotation)
+    and use the current value as that day's delta instead of emitting a
+    negative spend bar.
+    """
+    # Take only the last *window_days* entries
+    window = snapshots[-window_days:] if len(snapshots) > window_days else list(snapshots)
+    result: list[dict[str, Any]] = []
+    for i, entry in enumerate(window):
+        delta = None
+        if i > 0 and entry.get("used") is not None and window[i - 1].get("used") is not None:
+            delta = float(entry["used"]) - float(window[i - 1]["used"])
+            if delta < 0:
+                delta = float(entry["used"])
+            # Rounding: avoid -0.0 and tiny floating-point noise
+            if abs(delta) < 1e-9:
+                delta = 0.0
+            else:
+                delta = round(delta, 6)
+        result.append({
+            "date": entry["date"],
+            "used": entry.get("used"),
+            "delta": delta,
+        })
+    return result
+
+
+def get_provider_cost_history(provider_id: str | None = None, days: int = 7) -> dict[str, Any]:
+    """Return daily cost-history snapshots with deltas for a provider.
+
+    Currently only ``openrouter`` is supported.  On each call the
+    endpoint fetches the current cumulative usage from the OpenRouter
+    ``/auth/key`` endpoint, appends/updates today's snapshot, and
+    returns the last *days* snapshots with per-day deltas.
+
+    Returns a dict matching the existing API style (``ok``, ``provider``,
+    ``status``, ``message``, …).
+    """
+    provider = (provider_id or "").strip().lower()
+    if not provider:
+        return {
+            "ok": False,
+            "provider": None,
+            "status": "missing_provider",
+            "message": "Provider parameter is required.  Use ?provider=openrouter",
+        }
+
+    if provider != "openrouter":
+        display_name = _PROVIDER_DISPLAY.get(provider, provider.replace("-", " ").title())
+        return {
+            "ok": False,
+            "provider": provider,
+            "display_name": display_name,
+            "supported": False,
+            "status": "unsupported",
+            "message": f"Cost history is not available for {display_name}. Only openrouter is supported in this release.",
+        }
+
+    display_name = _PROVIDER_DISPLAY.get("openrouter", "OpenRouter")
+    api_key = _get_provider_api_key("openrouter")
+    if not api_key:
+        return {
+            "ok": False,
+            "provider": "openrouter",
+            "display_name": display_name,
+            "supported": True,
+            "status": "no_key",
+            "message": "OpenRouter cost history needs an OPENROUTER_API_KEY configured on the server.",
+        }
+
+    # Fetch current cumulative usage from OpenRouter
+    key_info = _fetch_openrouter_key_usage(api_key)
+    if key_info is None:
+        # Upstream failure — still return any previously persisted snapshots
+        # so the chart degrades gracefully instead of going blank.
+        snapshots = _read_cost_snapshots("openrouter")
+        deltas = _compute_deltas(snapshots, days)
+        return {
+            "ok": False,
+            "provider": "openrouter",
+            "display_name": display_name,
+            "supported": True,
+            "status": "unavailable",
+            "window_days": days,
+            "snapshots": deltas,
+            "limit": None,
+            "label": None,
+            "message": "OpenRouter cost history is temporarily unavailable. Showing last known data.",
+        }
+
+    # Persist today's snapshot
+    try:
+        snapshots = _append_cost_snapshot("openrouter", key_info["usage"], key_info["limit"])
+    except Exception:
+        logger.debug("Failed to persist cost snapshot for openrouter", exc_info=True)
+        snapshots = _read_cost_snapshots("openrouter")
+
+    deltas = _compute_deltas(snapshots, days)
+    return {
+        "ok": True,
+        "provider": "openrouter",
+        "display_name": display_name,
+        "supported": True,
+        "status": "available",
+        "window_days": days,
+        "snapshots": deltas,
+        "limit": key_info.get("limit"),
+        "label": key_info.get("label") or "OpenRouter credits",
+        "message": "OpenRouter cost history loaded.",
+    }
+
+
 # SECTION: Public API


@@ -2033,7 +2033,7 @@ from api.run_journal import (
    read_run_events,
    stale_interrupted_event,
 )
-from api.providers import get_providers, get_provider_quota, set_provider_key, remove_provider_key
+from api.providers import get_providers, get_provider_quota, get_provider_cost_history, set_provider_key, remove_provider_key
 from api.onboarding import (
    apply_onboarding_setup,
    get_onboarding_status,
@@ -3323,6 +3323,16 @@ def handle_get(handler, parsed) -> bool:
        refresh = (query.get("refresh", [""])[0] or "").strip().lower() in {"1", "true", "yes", "on"}
        return j(handler, get_provider_quota(provider_id, refresh=refresh))

+    if parsed.path == "/api/provider/cost-history":
+        query = parse_qs(parsed.query)
+        provider_id = (query.get("provider", [""])[0] or None)
+        days_raw = (query.get("days", ["7"])[0] or "7").strip()
+        try:
+            days = max(1, min(int(days_raw), 365))
+        except (ValueError, TypeError):
+            days = 7
+        return j(handler, get_provider_cost_history(provider_id, days))
+
    if parsed.path == "/api/settings":
        settings = load_settings()
        # Never expose the stored password hash to clients
@@ -2464,6 +2464,145 @@ def _attempt_credential_self_heal(
        return None


+def _agent_cache_api_key_sig(resolved_api_key, credential_pool) -> str:
+    """Return the cache-signature component for runtime credentials.
+
+    Credential-pool providers can legitimately hand WebUI a different runtime
+    token on each request (round-robin pools, OAuth refresh, auth self-heal).
+    The AIAgent object is also where cross-turn memory-provider state lives, so
+    using the volatile token itself in the cache signature silently defeats the
+    per-session agent cache and drops warmed Hindsight prefetch results.
+    """
+    if credential_pool is not None:
+        return 'credential-pool'
+    import hashlib as _hashlib
+    return _hashlib.sha256((resolved_api_key or '').encode()).hexdigest()[:16]
+
+
+def _refresh_cached_agent_runtime(agent, agent_kwargs: dict) -> bool:
+    """Refresh volatile runtime credentials on a reused cached AIAgent.
+
+    The cache key intentionally ignores credential-pool token churn, but the
+    cached agent's LLM client still needs the latest selected/refreshed runtime
+    key. Keep long-lived provider/session state (memory prefetch, turn counters,
+    tool state) while swapping only the runtime credential/client.
+    """
+    if agent is None or not isinstance(agent_kwargs, dict):
+        return False
+
+    new_pool = agent_kwargs.get('credential_pool')
+    if new_pool is not None:
+        try:
+            agent._credential_pool = new_pool
+        except Exception:
+            pass
+
+    new_key = agent_kwargs.get('api_key') or ''
+    if not new_key:
+        return True
+
+    new_base = agent_kwargs.get('base_url') or getattr(agent, 'base_url', '') or ''
+    if getattr(agent, '_fallback_activated', False):
+        # Avoid mixing a refreshed primary credential into a live fallback
+        # runtime. Rebuilding is safer than mutating a fallback-active agent
+        # whose restore/cooldown state has not run yet for this turn.
+        return False
+
+    if new_key == (getattr(agent, 'api_key', '') or ''):
+        _refresh_cached_agent_primary_runtime_snapshot(agent)
+        return True
+
+    try:
+        if getattr(agent, 'api_mode', None) == 'anthropic_messages':
+            # Native Anthropic-style clients have their own construction path;
+            # switch_model() already handles token/client refresh there.
+            if hasattr(agent, 'switch_model'):
+                agent.switch_model(
+                    agent_kwargs.get('model') or getattr(agent, 'model', None),
+                    agent_kwargs.get('provider') or getattr(agent, 'provider', None),
+                    api_key=new_key,
+                    base_url=new_base,
+                    api_mode=agent_kwargs.get('api_mode') or getattr(agent, 'api_mode', ''),
+                )
+                return True
+            return False
+
+        if not hasattr(agent, '_client_kwargs') or not hasattr(agent, '_replace_primary_openai_client'):
+            # Test/fake-agent fallback: keep metadata accurate even if no real
+            # OpenAI client exists to rebuild.
+            agent.api_key = new_key
+            if new_base:
+                agent.base_url = new_base
+            _refresh_cached_agent_primary_runtime_snapshot(agent)
+            return True
+
+        client_kwargs = dict(getattr(agent, '_client_kwargs', {}) or {})
+        client_kwargs['api_key'] = new_key
+        if new_base:
+            client_kwargs['base_url'] = new_base
+        agent._client_kwargs = client_kwargs
+        agent.api_key = new_key
+        if new_base:
+            agent.base_url = new_base
+        if hasattr(agent, '_apply_client_headers_for_base_url'):
+            agent._apply_client_headers_for_base_url(agent.base_url)
+        rebuilt = bool(agent._replace_primary_openai_client(reason='webui_credential_refresh'))
+        if rebuilt:
+            _refresh_cached_agent_primary_runtime_snapshot(agent)
+        return rebuilt
+    except Exception:
+        logger.debug('[webui] Failed to refresh cached agent runtime credentials', exc_info=True)
+        return False
+
+
+def _refresh_cached_agent_primary_runtime_snapshot(agent) -> None:
+    """Keep AIAgent's primary-runtime snapshot aligned with refreshed creds.
+
+    Long-lived AIAgent instances use `_primary_runtime` to restore the preferred
+    provider after fallback/transport recovery. If WebUI refreshes a cached
+    agent's runtime token but leaves that snapshot stale, a later restore can
+    resurrect the old credential and undo the refresh.
+    """
+    rt = getattr(agent, '_primary_runtime', None)
+    if not isinstance(rt, dict):
+        return
+
+    base_url = getattr(agent, 'base_url', rt.get('base_url'))
+    api_key = getattr(agent, 'api_key', rt.get('api_key', ''))
+    client_kwargs = dict(getattr(agent, '_client_kwargs', None) or rt.get('client_kwargs', {}) or {})
+
+    rt['base_url'] = base_url
+    rt['api_key'] = api_key
+    rt['client_kwargs'] = client_kwargs
+
+    # The default context compressor usually tracks the primary runtime too;
+    # keep both the live compressor fields and the fallback-restoration
+    # snapshot aligned when those attributes exist.
+    cc = getattr(agent, 'context_compressor', None)
+    if cc is not None:
+        if hasattr(cc, 'base_url'):
+            cc.base_url = base_url
+        if hasattr(cc, 'api_key'):
+            cc.api_key = api_key
+        if 'compressor_base_url' in rt:
+            rt['compressor_base_url'] = getattr(cc, 'base_url', base_url)
+        if 'compressor_api_key' in rt:
+            rt['compressor_api_key'] = getattr(cc, 'api_key', api_key)
+    else:
+        if 'compressor_base_url' in rt:
+            rt['compressor_base_url'] = base_url
+        if 'compressor_api_key' in rt:
+            rt['compressor_api_key'] = api_key
+
+    if getattr(agent, 'api_mode', None) == 'anthropic_messages':
+        if hasattr(agent, '_anthropic_api_key'):
+            rt['anthropic_api_key'] = getattr(agent, '_anthropic_api_key')
+        if hasattr(agent, '_anthropic_base_url'):
+            rt['anthropic_base_url'] = getattr(agent, '_anthropic_base_url')
+        if hasattr(agent, '_is_anthropic_oauth'):
+            rt['is_anthropic_oauth'] = getattr(agent, '_is_anthropic_oauth')
+
+
 def _run_agent_streaming(
    session_id,
    msg_text,
@@ -3349,11 +3488,16 @@ def _run_agent_streaming(
                import hashlib as _hashlib
                import json as _json
                from api.config import SESSION_AGENT_CACHE, SESSION_AGENT_CACHE_LOCK
+                _credential_pool = _rt.get('credential_pool')
                _sig_blob = _json.dumps([
                    resolved_model or '',
-                    _hashlib.sha256((resolved_api_key or '').encode()).hexdigest()[:16],
+                    _agent_cache_api_key_sig(resolved_api_key, _credential_pool),
                    resolved_base_url or '',
                    resolved_provider or '',
+                    _rt.get('api_mode') or '',
+                    _rt.get('command') or '',
+                    _rt.get('args') or [],
+                    bool(_credential_pool),
                    _max_iterations_cfg or '',
                    _max_tokens_cfg or '',
                    _fallback_resolved or {},
@@ -3377,6 +3521,23 @@ def _run_agent_streaming(
                        SESSION_AGENT_CACHE.move_to_end(session_id)  # LRU: mark as recently used
                        logger.debug('[webui] Reusing cached agent for session %s', session_id)

+                if agent is not None:
+                    # Refresh volatile runtime credentials selected from provider
+                    # pools without discarding cross-turn agent/provider state.
+                    if not _refresh_cached_agent_runtime(agent, _agent_kwargs):
+                        logger.warning(
+                            '[webui] Cached agent runtime could not be safely refreshed; rebuilding agent for session %s',
+                            session_id,
+                        )
+                        try:
+                            if getattr(agent, '_session_db', None) is not None:
+                                agent._session_db.close()
+                        except Exception:
+                            pass
+                        with SESSION_AGENT_CACHE_LOCK:
+                            SESSION_AGENT_CACHE.pop(session_id, None)
+                        agent = None
+
                if agent is not None:
                    # Refresh per-turn callbacks — these close over request-scoped
                    # objects (put queue, cancel_event) that are new each request.
@@ -3426,7 +3587,6 @@ def _run_agent_streaming(
                            # released until GC finalizes the agent, which on a
                            # long-running server may be never. Close it
                            # explicitly so the WAL handles release immediately.
-                            # (Opus pre-release follow-up to #1421.)
                            try:
                                _evicted_agent = evicted_entry[0] if isinstance(evicted_entry, tuple) else None
                                if _evicted_agent is not None and getattr(_evicted_agent, '_session_db', None) is not None:
@@ -410,6 +410,16 @@ function closeLiveStream(sessionId, streamId){
  delete LIVE_STREAMS[sessionId];
 }

+function closeOtherLiveStreams(activeSid){
+  // Keep the live token SSE connection scoped to the conversation pane the user
+  // is actually viewing. Background sessions still show running/finished state
+  // through the session list and can reattach when selected, but they should not
+  // keep one EventSource each and exhaust the browser connection pool (#2313).
+  for(const sid of Object.keys(LIVE_STREAMS)){
+    if(sid!==activeSid) closeLiveStream(sid);
+  }
+}
+
 function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
  if(!activeSid||!streamId) return;
  const reconnecting=!!options.reconnecting;
@@ -427,6 +437,7 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
  ){
    return;
  }
+  closeOtherLiveStreams(activeSid);
  closeLiveStream(activeSid);

  let assistantText='';
@@ -513,6 +524,9 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
      toolCalls:inflight.toolCalls||[],
    });
  }
+  function snapshotLiveTurn(){
+    if(typeof snapshotLiveTurnHtmlForSession==='function') snapshotLiveTurnHtmlForSession(activeSid);
+  }
  // Throttled variant for token-by-token updates. persistInflightState()
  // calls saveInflightState() which does JSON.parse + JSON.stringify + write
  // on the entire inflight map every call. On a fast model at 60 tok/s with
@@ -1170,6 +1184,7 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
        }
      }
      scrollIfPinned();
+      snapshotLiveTurn();
    };
    const frameIntervalMs=_shouldUseStreamFade()?33:66;
    if(sinceLastMs>=frameIntervalMs){
@@ -1197,19 +1212,18 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){

    source.addEventListener('token',e=>{
      if(_terminalStateReached||_streamFinalized) return;
-      if(!S.session||S.session.session_id!==activeSid) return;
      const d=JSON.parse(e.data);
      assistantText+=d.text;
      syncInflightAssistantMessage();
      if(!S.session||S.session.session_id!==activeSid) return;
      const parsed=_parseStreamState();
+      if(_freshSegment&&window._showThinking!==false) appendThinking(_liveThinkingText());
      if(String((parsed&&parsed.displayText)||'').trim()||assistantRow) ensureAssistantRow();
      _scheduleRender();
    });

    source.addEventListener('interim_assistant',e=>{
      if(_terminalStateReached||_streamFinalized) return;
-      if(!S.session||S.session.session_id!==activeSid) return;
      const d=JSON.parse(e.data);
      const visible=String(d&&d.text?d.text:'').trim();
      const alreadyStreamed=!!(d&&d.already_streamed);
@@ -1217,19 +1231,19 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
        return;
      }
      if(alreadyStreamed){
+        if(!S.session||S.session.session_id!==activeSid) return;
        _resetAssistantSegment();
        return;
      }
-      assistantText+=visible;
+      assistantText += assistantText ? `\n\n${visible}` : visible;
      visibleInterimSnippets.push(visible);
      syncInflightAssistantMessage();
      if(!S.session||S.session.session_id!==activeSid) return;
-      const parsed=_parseStreamState();
      if(window._showThinking!==false){
        if(typeof updateThinking==='function') updateThinking(_liveThinkingText());
        else appendThinking(_liveThinkingText());
      }
-      if(String((parsed&&parsed.displayText)||'').trim()||assistantRow) ensureAssistantRow();
+      ensureAssistantRow(true);
      _scheduleRender();
    });

@@ -1274,6 +1288,7 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
      liveReasoningText='';
      const oldRow=$('toolRunningRow');if(oldRow)oldRow.remove();
      appendLiveToolCard(tc);
+      snapshotLiveTurn();
      // Reset the live assistant row reference so that any text tokens arriving
      // after this tool call create a NEW segment appended below the tool card,
      // rather than updating the old segment that sits above it in the DOM.
@@ -1310,6 +1325,7 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
      persistInflightState();
      if(!S.session||S.session.session_id!==activeSid) return;
      appendLiveToolCard(tc);
+      snapshotLiveTurn();
      scrollIfPinned();
    });

@@ -1603,14 +1619,25 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
      try{ d=JSON.parse(e.data||'{}')||{}; }catch(_){ d={}; }
      if(d.session_id&&d.session_id!==activeSid) return;
      if(typeof setCompressionUi==='function'){
-        setCompressionUi({
+        const state={
          sessionId:activeSid,
          phase:'running',
          automatic:true,
          message:d.message||'Auto-compressing context...',
-        });
+        };
+        setCompressionUi(state);
+        const liveAnswerStarted=!!(assistantRow||String(((_parseStreamState&&_parseStreamState())||{}).displayText||'').trim());
+        if(liveAnswerStarted&&typeof appendLiveCompressionCard==='function'&&appendLiveCompressionCard(state)){
+          // The live card is now anchored in the turn. Keeping the same running
+          // state in global transient UI makes later renderMessages() calls insert
+          // a duplicate Automatic Compression card.
+          window._compressionUi=null;
+          snapshotLiveTurn();
+          return;
+        }
      }
      if(typeof renderMessages==='function') renderMessages({preserveScroll:true});
+      snapshotLiveTurn();
    });

    source.addEventListener('compressed',e=>{
@@ -1627,13 +1654,22 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
        _syncCtxIndicator(S.lastUsage);
      }
      if(typeof setCompressionUi==='function'){
-        setCompressionUi({
+        const state={
          sessionId:activeSid,
          phase:'done',
          automatic:true,
          message,
          summary:{headline:message},
-        });
+        };
+        setCompressionUi(state);
+        const appended=typeof appendLiveCompressionCard==='function'&&appendLiveCompressionCard(state);
+        if(appended){
+          // The live card is now anchored in the turn. Do not keep the automatic
+          // completion state as global transient UI, otherwise every subsequent
+          // render projects the same Auto Compression card again.
+          window._compressionUi=null;
+          snapshotLiveTurn();
+        }
      }
      if(typeof _setCompressionSessionLock==='function') _setCompressionSessionLock(null);
      if(!S.busy&&typeof renderMessages==='function') renderMessages();
@@ -558,8 +558,10 @@ async function loadSession(sid){
    return true;
  }

-  // Phase 2a: If session is streaming, restore from INFLIGHT cache before
-  // loading full messages (INFLIGHT state is self-contained and sufficient).
+  // Phase 2a: If session is streaming, restore the persisted transcript first,
+  // then merge the local INFLIGHT live tail. INFLIGHT is a recovery tail, not a
+  // complete transcript; treating it as the full source makes long sessions look
+  // like they lost history after switching away and back.
  if(!INFLIGHT[sid]&&activeStreamId&&typeof loadInflightState==='function'){
    const stored=loadInflightState(sid, activeStreamId);
    if(stored){
@@ -573,8 +575,15 @@ async function loadSession(sid){
  }

  if(INFLIGHT[sid]){
-    // Streaming session: use cached INFLIGHT messages (already has pending assistant output).
-    S.messages=INFLIGHT[sid].messages;
+    const inflightMessages=INFLIGHT[sid].messages||[];
+    S.messages=[];
+    S.toolCalls=[];
+    try {
+      await _ensureMessagesLoaded(sid);
+    } catch(e) {
+      S.messages=inflightMessages;
+    }
+    S.messages=_mergeInflightTailMessages(S.messages,inflightMessages);
    S.toolCalls=(INFLIGHT[sid].toolCalls||[]);
    if(_mergePendingSessionMessage(S.session,S.messages)){
      INFLIGHT[sid].messages=S.messages;
@@ -584,12 +593,17 @@ async function loadSession(sid){
    // replaying persisted live tools so the compact Activity count survives
    // switching away from and back to an active chat (#1715).
    S.activeStreamId=activeStreamId;
-    syncTopbar();renderMessages();appendThinking();loadDir('.');
-    clearLiveToolCards();
-    if(typeof placeLiveToolCardsHost==='function') placeLiveToolCardsHost();
-    for(const tc of (S.toolCalls||[])){
-      if(tc&&tc.name) appendLiveToolCard(tc);
+    syncTopbar();renderMessages();
+    const restoredLiveTurn=typeof restoreLiveTurnHtmlForSession==='function'&&restoreLiveTurnHtmlForSession(sid);
+    if(!restoredLiveTurn){
+      appendThinking();
+      clearLiveToolCards();
+      if(typeof placeLiveToolCardsHost==='function') placeLiveToolCardsHost();
+      for(const tc of (S.toolCalls||[])){
+        if(tc&&tc.name) appendLiveToolCard(tc);
+      }
    }
+    loadDir('.');
    setBusy(true);setComposerStatus('');
    startApprovalPolling(sid);
    if(typeof startClarifyPolling==='function') startClarifyPolling(sid);
@@ -1142,6 +1156,40 @@ async function _ensureMessagesLoaded(sid) {
  }
 }

+function _messageComparableText(m){
+  if(!m) return '';
+  if(typeof msgContent==='function'){
+    try{return String(msgContent(m)||'').trim();}
+    catch(_){}
+  }
+  return String(m.content||'').trim();
+}
+
+function _sameTranscriptMessage(a,b){
+  return !!(a&&b) &&
+    String(a.role||'')===String(b.role||'') &&
+    _messageComparableText(a)===_messageComparableText(b);
+}
+
+function _mergeInflightTailMessages(baseMessages, inflightMessages){
+  const base=Array.isArray(baseMessages)?baseMessages:[];
+  const inflight=Array.isArray(inflightMessages)?inflightMessages:[];
+  let liveIdx=-1;
+  for(let i=inflight.length-1;i>=0;i--){
+    if(inflight[i]&&inflight[i]._live){liveIdx=i;break;}
+  }
+  if(liveIdx<0) return base;
+  let start=liveIdx;
+  if(liveIdx>0&&inflight[liveIdx-1]&&inflight[liveIdx-1].role==='user') start=liveIdx-1;
+  const tail=inflight.slice(start).filter(m=>m&&m.role);
+  const merged=[...base];
+  for(const msg of tail){
+    const duplicate=merged.slice(-Math.max(5,tail.length+2)).some(existing=>_sameTranscriptMessage(existing,msg));
+    if(!duplicate) merged.push(msg);
+  }
+  return merged;
+}
+
 // Load older messages when the user scrolls to the top of the conversation.
 // Prepends them to S.messages and re-renders, preserving scroll position.
 let _loadingOlder = false;
@@ -3720,6 +3720,67 @@ function clearInflightState(sid){
  }catch(_){ }
 }

+function snapshotLiveTurnHtmlForSession(sid){
+  // Keep the DOM snapshot memory-only. Persisted INFLIGHT state intentionally
+  // stores structured stream state, not outerHTML, so a hard reload still uses
+  // the safer flat replay path instead of reviving stale nodes/listeners.
+  if(!sid||!INFLIGHT[sid]) return;
+  const turn=$('liveAssistantTurn');
+  if(!turn) return;
+  if(turn.dataset&&turn.dataset.sessionId&&turn.dataset.sessionId!==sid) return;
+  INFLIGHT[sid].liveTurnHtml=turn.outerHTML;
+}
+
+function _liveAssistantSegmentTextLength(seg){
+  if(!seg) return 0;
+  const body=seg.querySelector('.msg-body')||seg;
+  return String(body.textContent||'').trim().length;
+}
+
+function _mergeRestoredLiveAssistantSegment(restored, existing){
+  if(!restored||!existing) return;
+  const existingLive=existing.querySelector('[data-live-assistant="1"]');
+  if(!existingLive) return;
+  const restoredLive=restored.querySelector('[data-live-assistant="1"]');
+  const existingLen=_liveAssistantSegmentTextLength(existingLive);
+  const restoredLen=_liveAssistantSegmentTextLength(restoredLive);
+  if(existingLen<=restoredLen) return;
+  const replacement=existingLive.cloneNode(true);
+  if(restoredLive){
+    restoredLive.replaceWith(replacement);
+    return;
+  }
+  const blocks=_assistantTurnBlocks(restored);
+  if(!blocks) return;
+  const anchor=Array.from(blocks.children).filter(el=>
+    el.matches('.tool-call-group,.tool-card-row,.agent-activity-thinking,.thinking-card-row,[data-live-assistant="1"]')
+  ).pop();
+  if(anchor) anchor.insertAdjacentElement('afterend', replacement);
+  else blocks.appendChild(replacement);
+}
+
+function restoreLiveTurnHtmlForSession(sid){
+  const inflight=INFLIGHT[sid];
+  if(!sid||!inflight||!inflight.liveTurnHtml) return false;
+  const inner=$('msgInner');
+  if(!inner) return false;
+  const template=document.createElement('template');
+  template.innerHTML=String(inflight.liveTurnHtml||'').trim();
+  const restored=template.content.firstElementChild;
+  if(!restored) return false;
+  restored.id='liveAssistantTurn';
+  if(S.session) restored.dataset.sessionId=S.session.session_id;
+  const existing=$('liveAssistantTurn');
+  _mergeRestoredLiveAssistantSegment(restored, existing);
+  if(existing) existing.replaceWith(restored);
+  else inner.appendChild(restored);
+  const liveGroup=restored.querySelector('.tool-call-group[data-live-tool-call-group="1"]');
+  if(liveGroup&&typeof _startActivityElapsedTimer==='function') _startActivityElapsedTimer(liveGroup);
+  if(typeof placeLiveToolCardsHost==='function') placeLiveToolCardsHost();
+  requestAnimationFrame(()=>postProcessRenderedMessages(restored));
+  return true;
+}
+
 function markInflight(sid, streamId) {
  localStorage.setItem(INFLIGHT_KEY, JSON.stringify({sid, streamId, ts: Date.now()}));
 }
@@ -4540,23 +4601,26 @@ function _createAssistantTurn(tsTitle='', tpsText=''){
  const row=document.createElement('div');
  row.className='msg-row assistant-turn';
  row.dataset.role='assistant';
+  if(S.session) row.dataset.sessionId=S.session.session_id;
  row.innerHTML=`${_assistantRoleHtml(tsTitle, tpsText)}<div class="assistant-turn-blocks"></div>`;
  return row;
 }
 function _assistantTurnBlocks(turn){
  return turn?turn.querySelector('.assistant-turn-blocks'):null;
 }
-function _thinkingCardHtml(text){
+function _thinkingCardHtml(text, open){
  const clean=_sanitizeThinkingDisplayText(text);
-  return `<div class="thinking-card"><div class="thinking-card-header" onclick="this.parentElement.classList.toggle('open')"><span class="thinking-card-icon">${li('lightbulb',14)}</span><span class="thinking-card-label">${t('thinking')}</span><span class="thinking-card-toggle">${li('chevron-right',12)}</span></div><div class="thinking-card-body"><pre>${esc(clean)}</pre></div></div>`;
+  return open
+    ? `<div class="thinking-card open"><div class="thinking-card-header" onclick="this.parentElement.classList.toggle('open')"><span class="thinking-card-icon">${li('lightbulb',14)}</span><span class="thinking-card-label">${t('thinking')}</span><span class="thinking-card-toggle">${li('chevron-right',12)}</span></div><div class="thinking-card-body"><pre>${esc(clean)}</pre></div></div>`
+    : `<div class="thinking-card"><div class="thinking-card-header" onclick="this.parentElement.classList.toggle('open')"><span class="thinking-card-icon">${li('lightbulb',14)}</span><span class="thinking-card-label">${t('thinking')}</span><span class="thinking-card-toggle">${li('chevron-right',12)}</span></div><div class="thinking-card-body"><pre>${esc(clean)}</pre></div></div>`;
 }
 function isSimplifiedToolCalling(){
  return window._simplifiedToolCalling!==false;
 }
-function _thinkingActivityNode(text){
+function _thinkingActivityNode(text, open){
  const row=document.createElement('div');
  row.className='agent-activity-thinking';
-  row.innerHTML=_thinkingCardHtml(text);
+  row.innerHTML=_thinkingCardHtml(text, open);
  return row;
 }
 // ── Activity-group user expand intent (#1298) ──────────────────────────────
@@ -4740,17 +4804,24 @@ function _compressionCardsHtml(state){
 }
 function _autoCompressionCardsHtml(state){
  const fallback='Context auto-compressed to continue the conversation';
-  const detail=String(state.message||fallback).trim()||fallback;
-  const preview=String(state.summary?.headline||detail).trim()||detail;
+  const running=state&&state.phase==='running';
+  const detail=running
+    ? (String(state.message||'Auto-compressing context...').trim()||'Auto-compressing context...')
+    : (String(state.message||fallback).trim()||fallback);
+  const preview=running
+    ? detail
+    : (String(state.summary?.headline||detail).trim()||detail);
  return `
    <div class="tool-card-row compression-card-row" data-compression-card="1">
      ${_compressionStatusCardHtml({
        statusLabel: t('auto_compress_label'),
        previewText: preview,
        detail,
-        icon: li('check',13),
-        open: false,
-        variantClass: 'tool-card-compress-complete tool-card-compress-auto',
+        icon: running ? '<span class="tool-card-running-dot"></span>' : li('check',13),
+        open: running,
+        variantClass: running
+          ? 'tool-card-compress-running tool-card-compress-auto'
+          : 'tool-card-compress-complete tool-card-compress-auto',
      })}
    </div>`;
 }
@@ -4760,6 +4831,26 @@ function _compressionCardsNode(state){
  wrap.innerHTML=`<div class="compression-turn-blocks">${_compressionCardsHtml(state)}</div>`;
  return wrap;
 }
+function appendLiveCompressionCard(state){
+  if(!S.session||!S.activeStreamId||!state) return false;
+  let turn=$('liveAssistantTurn');
+  if(!turn){
+    turn=_createAssistantTurn();
+    turn.id='liveAssistantTurn';
+    if(S.session) turn.dataset.sessionId=S.session.session_id;
+    $('msgInner').appendChild(turn);
+  }
+  const inner=_assistantTurnBlocks(turn);
+  if(!inner) return false;
+  const node=_compressionCardsNode(state);
+  if(!node) return false;
+  node.setAttribute('data-live-compression-card','1');
+  const existing=inner.querySelector('[data-live-compression-card="1"]');
+  if(existing) existing.replaceWith(node);
+  else inner.appendChild(node);
+  if(typeof scrollIfPinned==='function') scrollIfPinned();
+  return true;
+}
 function _isHandoffSummaryToolPayload(value){
  if(!value||typeof value!=='object'||Array.isArray(value)) return false;
  return value._handoff_summary_card === true;
@@ -5708,14 +5799,18 @@ function renderMessages(options){
        }
        if(!anchorRow) continue;
        const anchorParent=anchorRow.parentElement;
-        const insertAfterNode = anchorInsertAfter.get(anchorRow) || anchorRow;
+        let insertAfterNode = anchorInsertAfter.get(anchorRow) || anchorRow;
+        const thinkingText=assistantThinking.get(aIdx);
+        if(thinkingText){
+          const thinkingNode=_thinkingActivityNode(thinkingText, false);
+          anchorParent.insertBefore(thinkingNode, anchorRow);
+        }
+        if(!cards.length) continue;
        const group=ensureActivityGroup(anchorParent,{collapsed:true,anchor:insertAfterNode,activityKey:`assistant:${aIdx}`});
        const sourceMsg=S.messages[aIdx]||{};
        if(sourceMsg._turnDuration!==undefined) group.setAttribute('data-turn-duration', String(sourceMsg._turnDuration));
        const body=group&&group.querySelector('.tool-call-group-body');
        if(!body) continue;
-        const thinkingText=assistantThinking.get(aIdx);
-        if(thinkingText) body.appendChild(_thinkingActivityNode(thinkingText));
        for(const tc of cards){
          body.appendChild(buildToolCard(tc));
        }
@@ -6860,31 +6955,28 @@ function appendThinking(text=''){
    }
    return;
  }
-  if(!String(text||'').trim()){
-    scrollIfPinned();
-    return;
-  }
-  const allChildren=Array.from(blocks.children);
-  const anchor=allChildren.filter(el=>
-    el.id!=='toolRunningRow' &&
-    el.matches('[data-live-assistant="1"],.tool-call-group,.tool-card-row,.agent-activity-thinking')
-  ).pop();
-  const group=ensureActivityGroup(blocks,{live:true,collapsed:true,anchor,activityKey:_activityKeyForLiveTurn()});
-  const body=group&&group.querySelector('.tool-call-group-body');
-  if(!body) return;
-  let row=body.querySelector('.agent-activity-thinking[data-thinking-active="1"]');
+  const thinkingText=String(text||'').trim()||'Thinking…';
+  blocks.querySelectorAll('.tool-call-group[data-live-tool-call-group="1"][data-live-activity-current="1"]').forEach(group=>{
+    group.removeAttribute('data-live-activity-current');
+  });
+  let row=blocks.querySelector('.agent-activity-thinking[data-thinking-active="1"]');
  if(!row){
-    row=document.createElement('div');
-    row.className='agent-activity-thinking';
+    row=_thinkingActivityNode(thinkingText, false);
    row.setAttribute('data-thinking-active','1');
-    body.insertBefore(row, body.firstChild);
+    const allChildren=Array.from(blocks.children);
+    const anchor=allChildren.filter(el=>
+      el.id!=='toolRunningRow' &&
+      el.matches('[data-live-assistant="1"],.tool-call-group,.tool-card-row,.agent-activity-thinking')
+    ).pop();
+    if(anchor) anchor.insertAdjacentElement('afterend', row);
+    else blocks.appendChild(row);
+  }else{
+    _renderThinkingInto(row,thinkingText);
  }
-  _renderThinkingInto(row,text);
-  _syncToolCallGroupSummary(group);
  scrollIfPinned();
  if(_scrollPinned){
-    const thinkingBody=row&&row.querySelector('.thinking-card-body');
-    if(thinkingBody) thinkingBody.scrollTop=thinkingBody.scrollHeight;
+    const body=row&&row.querySelector('.thinking-card-body');
+    if(body) body.scrollTop=body.scrollHeight;
  }
 }
 function updateThinking(text=''){appendThinking(text);}
@@ -67,6 +67,18 @@ def test_auto_compression_completion_transition_is_preserved_after_running_liste
    assert "phase:'done'" in _compressed_listener_block()


+def test_auto_compression_does_not_rerender_over_live_answer_text():
+    block = _compressing_listener_block()
+    src = _read("static/ui.js")
+
+    assert "const liveAnswerStarted=" in block
+    assert "appendLiveCompressionCard(state)" in block
+    assert block.index("appendLiveCompressionCard(state)") < block.index("renderMessages({preserveScroll:true})")
+    assert "window._compressionUi=null;" in block
+    assert "function appendLiveCompressionCard(state)" in src
+    assert 'data-live-compression-card' in src
+
+
 def test_auto_compression_sse_uses_transient_card_not_fake_message():
    """Auto compression must not inject display-only text into S.messages."""
    src = _read("static/messages.js")
@@ -78,6 +90,9 @@ def test_auto_compression_sse_uses_transient_card_not_fake_message():
    assert "phase:'done'" in block
    assert "automatic:true" in block
    assert "_setCompressionSessionLock" in block
+    assert "const appended=typeof appendLiveCompressionCard==='function'&&appendLiveCompressionCard(state);" in block
+    assert "window._compressionUi=null;" in block
+    assert block.index("appendLiveCompressionCard(state)") < block.index("window._compressionUi=null;")


 def test_auto_compression_sse_keeps_inactive_and_malformed_paths_safe():
@@ -45,6 +45,25 @@ def test_attach_live_stream_reuses_existing_same_stream_transport():
    assert "return" in body[reuse_pos:close_pos]


+def test_attach_live_stream_closes_other_session_streams_before_opening_new_one():
+    """Only the selected conversation pane should hold an open chat SSE transport."""
+    body = _function_body(MESSAGES_JS, "attachLiveStream")
+    helper = _function_body(MESSAGES_JS, "closeOtherLiveStreams")
+
+    helper_compact = helper.replace(" ", "")
+    assert "Object.keys(LIVE_STREAMS)" in helper
+    assert "if(sid!==activeSid)closeLiveStream(sid)" in helper_compact
+
+    reuse_pos = body.find("const existingLive=LIVE_STREAMS[activeSid]")
+    close_other_pos = body.find("closeOtherLiveStreams(activeSid)")
+    close_current_pos = body.find("\n  closeLiveStream(activeSid);\n")
+    assert close_other_pos != -1, "attachLiveStream() should prune background chat EventSources"
+    assert reuse_pos < close_other_pos < close_current_pos, (
+        "same-stream reuse should happen before pruning, and pruning should happen "
+        "before replacing the active session transport"
+    )
+
+
 def test_attach_live_stream_updates_uploads_before_same_stream_reuse():
    """Reusing transport must not skip per-session uploaded attachment state."""
    body = _function_body(MESSAGES_JS, "attachLiveStream")
@@ -25,7 +25,7 @@ def test_load_session_inflight_reattach_merges_pending_user_message_before_rende
    block = _load_session_inflight_branch()

    merge_pos = block.find("_mergePendingSessionMessage")
-    render_pos = block.find("renderMessages();appendThinking();")
+    render_pos = block.find("renderMessages();")

    assert merge_pos != -1, (
        "loadSession's INFLIGHT reattach branch must merge pending_user_message "
@@ -36,6 +36,10 @@ def test_load_session_inflight_reattach_merges_pending_user_message_before_rende
        "The pending user row must be present before renderMessages() rebuilds "
        "the active transcript"
    )
+    assert "restoreLiveTurnHtmlForSession(sid)" in block, (
+        "Session restore may keep a live DOM snapshot instead of always "
+        "recreating a fresh Thinking row after renderMessages()"
+    )
    assert "INFLIGHT[sid].messages=S.messages;" in block, (
        "After merging the pending user row, the INFLIGHT cache should be updated "
        "so later session switches keep the same visible turn"
@@ -0,0 +1,555 @@
+"""Regression coverage for OpenRouter cost-history endpoint (#692).
+
+Tests cover:
+  - Happy-path snapshot append and delta computation
+  - Missing credentials (no_key)
+  - Unsupported provider (non-openrouter)
+  - Upstream failure (graceful degradation with stale data)
+  - Malformed / corrupt snapshot file on disk
+  - Idempotent same-day updates
+  - No real network calls or private credential leakage
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import sys
+import types
+import urllib.error
+from datetime import datetime, timezone
+from io import BytesIO
+from pathlib import Path
+from types import SimpleNamespace
+
+import api.config as config
+import api.profiles as profiles
+
+ROOT = Path(__file__).resolve().parents[1]
+
+
+class _FakeResponse:
+    """Minimal stand-in for urllib.request.urlopen context manager."""
+
+    def __init__(self, payload: bytes):
+        self._payload = payload
+
+    def __enter__(self):
+        return self
+
+    def __exit__(self, *exc):
+        return False
+
+    def read(self):
+        return self._payload
+
+
+def _with_config(model=None, providers=None):
+    old_cfg = dict(config.cfg)
+    old_mtime = config._cfg_mtime
+    config.cfg.clear()
+    config.cfg["model"] = model or {}
+    if providers is not None:
+        config.cfg["providers"] = providers
+    try:
+        config._cfg_mtime = config.Path(config._get_config_path()).stat().st_mtime
+    except Exception:
+        config._cfg_mtime = 0.0
+    return old_cfg, old_mtime
+
+
+def _restore_config(old_cfg, old_mtime):
+    config.cfg.clear()
+    config.cfg.update(old_cfg)
+    config._cfg_mtime = old_mtime
+
+
+# ── Happy path: snapshot append + delta response ──────────────────────────────
+
+
+def test_openrouter_cost_history_happy_path(monkeypatch, tmp_path):
+    """On-demand snapshot append returns deltas from cumulative usage."""
+    monkeypatch.setattr(profiles, "get_active_hermes_home", lambda: tmp_path)
+    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
+    (tmp_path / ".env").write_text("OPENROUTER_API_KEY=test-or-key\n", encoding="utf-8")
+    old_cfg, old_mtime = _with_config(model={"provider": "openrouter"})
+
+    import api.providers as providers
+
+    call_count = {"n": 0}
+
+    def fake_urlopen(req, timeout):
+        call_count["n"] += 1
+        # Simulate cumulative usage of 5.0 credits used out of 20 limit
+        payload = {
+            "data": {
+                "limit_remaining": 15.0,
+                "usage": 5.0,
+                "limit": 20,
+                "label": "Test Label",
+                "key": "must-not-leak",
+            }
+        }
+        return _FakeResponse(json.dumps(payload).encode("utf-8"))
+
+    monkeypatch.setattr(providers.urllib.request, "urlopen", fake_urlopen)
+
+    # Freeze "today" so the test is deterministic
+    fake_today = "2030-04-15"
+    monkeypatch.setattr(providers, "datetime", type("DT", (), {
+        "now": staticmethod(lambda tz=None: datetime(2030, 4, 15, 12, 0, 0, tzinfo=tz or timezone.utc)),
+        "strftime": datetime.strftime,
+    }))
+
+    try:
+        result = providers.get_provider_cost_history("openrouter", days=7)
+    finally:
+        _restore_config(old_cfg, old_mtime)
+
+    assert result["ok"] is True
+    assert result["provider"] == "openrouter"
+    assert result["supported"] is True
+    assert result["status"] == "available"
+    assert result["window_days"] == 7
+    assert result["limit"] == 20
+    assert result["label"] == "Test Label"
+    assert result["message"] == "OpenRouter cost history loaded."
+    # One snapshot for today
+    assert len(result["snapshots"]) == 1
+    snap = result["snapshots"][0]
+    assert snap["date"] == fake_today
+    assert snap["used"] == 5.0
+    assert snap["delta"] is None  # first entry has no previous baseline
+    # Verify the snapshot file was persisted
+    snap_file = tmp_path / "cost-snapshots" / "openrouter.json"
+    assert snap_file.exists()
+    persisted = json.loads(snap_file.read_text(encoding="utf-8"))
+    assert len(persisted["snapshots"]) == 1
+    # No credential leakage
+    assert "test-or-key" not in repr(result)
+    assert "must-not-leak" not in repr(result)
+
+
+def test_openrouter_cost_history_deltas_from_cumulative(monkeypatch, tmp_path):
+    """Deltas are computed as differences between consecutive cumulative values."""
+    monkeypatch.setattr(profiles, "get_active_hermes_home", lambda: tmp_path)
+    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
+    (tmp_path / ".env").write_text("OPENROUTER_API_KEY=test-or-key\n", encoding="utf-8")
+    old_cfg, old_mtime = _with_config(model={"provider": "openrouter"})
+
+    import api.providers as providers
+
+    # Pre-seed two historical snapshots
+    snap_dir = tmp_path / "cost-snapshots"
+    snap_dir.mkdir(parents=True, exist_ok=True)
+    historical = {
+        "provider": "openrouter",
+        "snapshots": [
+            {"date": "2030-04-13", "used": 3.0, "limit": 20},
+            {"date": "2030-04-14", "used": 4.5, "limit": 20},
+        ],
+    }
+    (snap_dir / "openrouter.json").write_text(json.dumps(historical), encoding="utf-8")
+
+    call_count = {"n": 0}
+
+    def fake_urlopen(req, timeout):
+        call_count["n"] += 1
+        # Current cumulative usage is 7.0
+        payload = {"data": {"usage": 7.0, "limit": 20, "label": "Credits"}}
+        return _FakeResponse(json.dumps(payload).encode("utf-8"))
+
+    monkeypatch.setattr(providers.urllib.request, "urlopen", fake_urlopen)
+
+    # Freeze "today"
+    monkeypatch.setattr(providers, "datetime", type("DT", (), {
+        "now": staticmethod(lambda tz=None: datetime(2030, 4, 15, 12, 0, 0, tzinfo=tz or timezone.utc)),
+        "strftime": datetime.strftime,
+    }))
+
+    try:
+        result = providers.get_provider_cost_history("openrouter", days=7)
+    finally:
+        _restore_config(old_cfg, old_mtime)
+
+    assert result["ok"] is True
+    snaps = result["snapshots"]
+    assert len(snaps) == 3
+    # Day 1: no delta (baseline)
+    assert snaps[0]["date"] == "2030-04-13"
+    assert snaps[0]["used"] == 3.0
+    assert snaps[0]["delta"] is None
+    # Day 2: delta = 4.5 - 3.0 = 1.5
+    assert snaps[1]["date"] == "2030-04-14"
+    assert snaps[1]["used"] == 4.5
+    assert snaps[1]["delta"] == 1.5
+    # Day 3 (today): delta = 7.0 - 4.5 = 2.5
+    assert snaps[2]["date"] == "2030-04-15"
+    assert snaps[2]["used"] == 7.0
+    assert snaps[2]["delta"] == 2.5
+
+
+def test_openrouter_cost_history_reset_uses_fresh_series_delta(monkeypatch, tmp_path):
+    """A lower cumulative value starts a fresh series instead of a negative bar."""
+    monkeypatch.setattr(profiles, "get_active_hermes_home", lambda: tmp_path)
+    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
+    (tmp_path / ".env").write_text("OPENROUTER_API_KEY=test-or-key\n", encoding="utf-8")
+    old_cfg, old_mtime = _with_config(model={"provider": "openrouter"})
+
+    import api.providers as providers
+
+    snap_dir = tmp_path / "cost-snapshots"
+    snap_dir.mkdir(parents=True, exist_ok=True)
+    historical = {
+        "provider": "openrouter",
+        "snapshots": [
+            {"date": "2030-04-13", "used": 9.0, "limit": 20},
+            {"date": "2030-04-14", "used": 12.0, "limit": 20},
+        ],
+    }
+    (snap_dir / "openrouter.json").write_text(json.dumps(historical), encoding="utf-8")
+
+    def fake_urlopen(req, timeout):
+        # Simulate key rotation or provider reset: cumulative usage dropped.
+        payload = {"data": {"usage": 1.25, "limit": 20, "label": "Credits"}}
+        return _FakeResponse(json.dumps(payload).encode("utf-8"))
+
+    monkeypatch.setattr(providers.urllib.request, "urlopen", fake_urlopen)
+    monkeypatch.setattr(providers, "datetime", type("DT", (), {
+        "now": staticmethod(lambda tz=None: datetime(2030, 4, 15, 12, 0, 0, tzinfo=tz or timezone.utc)),
+        "strftime": datetime.strftime,
+    }))
+
+    try:
+        result = providers.get_provider_cost_history("openrouter", days=7)
+    finally:
+        _restore_config(old_cfg, old_mtime)
+
+    assert result["ok"] is True
+    assert result["snapshots"][-1]["date"] == "2030-04-15"
+    assert result["snapshots"][-1]["used"] == 1.25
+    assert result["snapshots"][-1]["delta"] == 1.25
+    assert all(snap["delta"] is None or snap["delta"] >= 0 for snap in result["snapshots"])
+
+
+def test_cost_snapshot_append_uses_lock(monkeypatch, tmp_path):
+    """Snapshot append serializes the read-modify-write critical section."""
+    monkeypatch.setattr(profiles, "get_active_hermes_home", lambda: tmp_path)
+
+    import api.providers as providers
+
+    entered = {"count": 0}
+
+    class RecordingLock:
+        def __enter__(self):
+            entered["count"] += 1
+            return self
+
+        def __exit__(self, *exc):
+            return False
+
+    monkeypatch.setattr(providers, "_COST_SNAPSHOT_LOCK", RecordingLock())
+    monkeypatch.setattr(providers, "datetime", type("DT", (), {
+        "now": staticmethod(lambda tz=None: datetime(2030, 4, 15, 12, 0, 0, tzinfo=tz or timezone.utc)),
+        "strftime": datetime.strftime,
+    }))
+
+    snapshots = providers._append_cost_snapshot("openrouter", 4.0, 20.0)
+
+    assert entered["count"] == 1
+    assert snapshots == [{"date": "2030-04-15", "used": 4.0, "limit": 20.0}]
+
+
+# ── Missing credentials ───────────────────────────────────────────────────────
+
+
+def test_openrouter_cost_history_no_key(monkeypatch, tmp_path):
+    """No API key → safe no_key response without network call."""
+    monkeypatch.setattr(profiles, "get_active_hermes_home", lambda: tmp_path)
+    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
+    old_cfg, old_mtime = _with_config(model={"provider": "openrouter"})
+
+    import api.providers as providers
+
+    def explode(*_a, **_kw):
+        raise AssertionError("should not call network without a key")
+
+    monkeypatch.setattr(providers.urllib.request, "urlopen", explode)
+
+    try:
+        result = providers.get_provider_cost_history("openrouter", days=7)
+    finally:
+        _restore_config(old_cfg, old_mtime)
+
+    assert result["ok"] is False
+    assert result["provider"] == "openrouter"
+    assert result["supported"] is True
+    assert result["status"] == "no_key"
+    assert "OPENROUTER_API_KEY" in result["message"]
+
+
+# ── Unsupported provider ──────────────────────────────────────────────────────
+
+
+def test_cost_history_unsupported_provider(monkeypatch, tmp_path):
+    """Non-openrouter providers return a clear unsupported response."""
+    monkeypatch.setattr(profiles, "get_active_hermes_home", lambda: tmp_path)
+    old_cfg, old_mtime = _with_config(model={"provider": "anthropic"})
+
+    import api.providers as providers
+
+    try:
+        result = providers.get_provider_cost_history("anthropic", days=7)
+    finally:
+        _restore_config(old_cfg, old_mtime)
+
+    assert result["ok"] is False
+    assert result["provider"] == "anthropic"
+    assert result["supported"] is False
+    assert result["status"] == "unsupported"
+    assert "openrouter" in result["message"].lower()
+
+
+def test_cost_history_missing_provider_param(monkeypatch, tmp_path):
+    """Empty provider parameter returns a clear error."""
+    monkeypatch.setattr(profiles, "get_active_hermes_home", lambda: tmp_path)
+
+    import api.providers as providers
+
+    result = providers.get_provider_cost_history("", days=7)
+    assert result["ok"] is False
+    assert result["status"] == "missing_provider"
+
+    result2 = providers.get_provider_cost_history(None, days=7)
+    assert result2["ok"] is False
+    assert result2["status"] == "missing_provider"
+
+
+# ── Upstream failure / graceful degradation ────────────────────────────────────
+
+
+def test_openrouter_cost_history_upstream_failure_degrades_gracefully(monkeypatch, tmp_path):
+    """When OpenRouter API fails, previously persisted snapshots are still returned."""
+    monkeypatch.setattr(profiles, "get_active_hermes_home", lambda: tmp_path)
+    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
+    (tmp_path / ".env").write_text("OPENROUTER_API_KEY=test-or-key\n", encoding="utf-8")
+    old_cfg, old_mtime = _with_config(model={"provider": "openrouter"})
+
+    import api.providers as providers
+
+    # Pre-seed a snapshot
+    snap_dir = tmp_path / "cost-snapshots"
+    snap_dir.mkdir(parents=True, exist_ok=True)
+    historical = {
+        "provider": "openrouter",
+        "snapshots": [
+            {"date": "2030-04-13", "used": 3.0, "limit": 20},
+        ],
+    }
+    (snap_dir / "openrouter.json").write_text(json.dumps(historical), encoding="utf-8")
+
+    req = providers.urllib.request.Request("https://openrouter.ai/api/v1/key")
+    def fake_urlopen(_req, timeout=None):
+        raise urllib.error.HTTPError(req.full_url, 500, "Server Error", {}, BytesIO(b"error"))
+
+    monkeypatch.setattr(providers.urllib.request, "urlopen", fake_urlopen)
+
+    try:
+        result = providers.get_provider_cost_history("openrouter", days=7)
+    finally:
+        _restore_config(old_cfg, old_mtime)
+
+    assert result["ok"] is False
+    assert result["status"] == "unavailable"
+    # Still returns previously persisted data
+    assert len(result["snapshots"]) == 1
+    assert result["snapshots"][0]["date"] == "2030-04-13"
+    assert "temporarily unavailable" in result["message"].lower()
+
+
+def test_openrouter_cost_history_timeout_is_safe(monkeypatch, tmp_path):
+    """Timeout from OpenRouter does not produce a traceback or leak secrets."""
+    monkeypatch.setattr(profiles, "get_active_hermes_home", lambda: tmp_path)
+    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
+    (tmp_path / ".env").write_text("OPENROUTER_API_KEY=test-or-key\n", encoding="utf-8")
+    old_cfg, old_mtime = _with_config(model={"provider": "openrouter"})
+
+    import api.providers as providers
+
+    def fake_urlopen(_req, timeout=None):
+        raise TimeoutError("slow secret")
+
+    monkeypatch.setattr(providers.urllib.request, "urlopen", fake_urlopen)
+
+    try:
+        result = providers.get_provider_cost_history("openrouter", days=7)
+    finally:
+        _restore_config(old_cfg, old_mtime)
+
+    assert result["ok"] is False
+    assert result["status"] == "unavailable"
+    assert "test-or-key" not in repr(result)
+    assert "secret" not in repr(result).lower()
+
+
+# ── Malformed / corrupt snapshot file ─────────────────────────────────────────
+
+
+def test_openrouter_cost_history_corrupt_snapshot_file(monkeypatch, tmp_path):
+    """A corrupt snapshot file on disk is handled gracefully (treated as empty)."""
+    monkeypatch.setattr(profiles, "get_active_hermes_home", lambda: tmp_path)
+    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
+    (tmp_path / ".env").write_text("OPENROUTER_API_KEY=test-or-key\n", encoding="utf-8")
+    old_cfg, old_mtime = _with_config(model={"provider": "openrouter"})
+
+    import api.providers as providers
+
+    # Write a corrupt file
+    snap_dir = tmp_path / "cost-snapshots"
+    snap_dir.mkdir(parents=True, exist_ok=True)
+    (snap_dir / "openrouter.json").write_text("NOT VALID JSON{{{{", encoding="utf-8")
+
+    def fake_urlopen(req, timeout):
+        payload = {"data": {"usage": 2.0, "limit": 10, "label": "Credits"}}
+        return _FakeResponse(json.dumps(payload).encode("utf-8"))
+
+    monkeypatch.setattr(providers.urllib.request, "urlopen", fake_urlopen)
+
+    # Freeze "today"
+    monkeypatch.setattr(providers, "datetime", type("DT", (), {
+        "now": staticmethod(lambda tz=None: datetime(2030, 4, 15, 12, 0, 0, tzinfo=tz or timezone.utc)),
+        "strftime": datetime.strftime,
+    }))
+
+    try:
+        result = providers.get_provider_cost_history("openrouter", days=7)
+    finally:
+        _restore_config(old_cfg, old_mtime)
+
+    # Corrupt file is ignored; fresh snapshot is created
+    assert result["ok"] is True
+    assert len(result["snapshots"]) == 1
+    assert result["snapshots"][0]["date"] == "2030-04-15"
+
+
+# ── Idempotent same-day updates ───────────────────────────────────────────────
+
+
+def test_openrouter_cost_history_same_day_idempotent(monkeypatch, tmp_path):
+    """Repeated calls on the same day update the snapshot in-place."""
+    monkeypatch.setattr(profiles, "get_active_hermes_home", lambda: tmp_path)
+    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
+    (tmp_path / ".env").write_text("OPENROUTER_API_KEY=test-or-key\n", encoding="utf-8")
+    old_cfg, old_mtime = _with_config(model={"provider": "openrouter"})
+
+    import api.providers as providers
+
+    call_count = {"n": 0}
+
+    def fake_urlopen(req, timeout):
+        call_count["n"] += 1
+        usage = 5.0 + call_count["n"]  # usage grows each call
+        payload = {"data": {"usage": usage, "limit": 20, "label": "Credits"}}
+        return _FakeResponse(json.dumps(payload).encode("utf-8"))
+
+    monkeypatch.setattr(providers.urllib.request, "urlopen", fake_urlopen)
+
+    # Freeze "today"
+    monkeypatch.setattr(providers, "datetime", type("DT", (), {
+        "now": staticmethod(lambda tz=None: datetime(2030, 4, 15, 12, 0, 0, tzinfo=tz or timezone.utc)),
+        "strftime": datetime.strftime,
+    }))
+
+    try:
+        r1 = providers.get_provider_cost_history("openrouter", days=7)
+        r2 = providers.get_provider_cost_history("openrouter", days=7)
+    finally:
+        _restore_config(old_cfg, old_mtime)
+
+    # Both calls succeed; only one snapshot date (today)
+    assert r1["ok"] is True
+    assert r2["ok"] is True
+    assert len(r1["snapshots"]) == 1
+    assert len(r2["snapshots"]) == 1
+    # Second call updated the same day's used value
+    assert r2["snapshots"][0]["used"] == 7.0  # 5.0 + 2 (second call)
+    # Verify persisted file has only one entry for today
+    snap_file = tmp_path / "cost-snapshots" / "openrouter.json"
+    persisted = json.loads(snap_file.read_text(encoding="utf-8"))
+    assert len(persisted["snapshots"]) == 1
+
+
+# ── Window days parameter ─────────────────────────────────────────────────────
+
+
+def test_openrouter_cost_history_window_days_truncation(monkeypatch, tmp_path):
+    """The window_days parameter limits how many snapshots are returned."""
+    monkeypatch.setattr(profiles, "get_active_hermes_home", lambda: tmp_path)
+    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
+    (tmp_path / ".env").write_text("OPENROUTER_API_KEY=test-or-key\n", encoding="utf-8")
+    old_cfg, old_mtime = _with_config(model={"provider": "openrouter"})
+
+    import api.providers as providers
+
+    # Pre-seed 5 historical snapshots
+    snap_dir = tmp_path / "cost-snapshots"
+    snap_dir.mkdir(parents=True, exist_ok=True)
+    historical = {
+        "provider": "openrouter",
+        "snapshots": [
+            {"date": f"2030-04-{d:02d}", "used": float(d), "limit": 20}
+            for d in range(10, 15)
+        ],
+    }
+    (snap_dir / "openrouter.json").write_text(json.dumps(historical), encoding="utf-8")
+
+    def fake_urlopen(req, timeout):
+        payload = {"data": {"usage": 15.0, "limit": 20, "label": "Credits"}}
+        return _FakeResponse(json.dumps(payload).encode("utf-8"))
+
+    monkeypatch.setattr(providers.urllib.request, "urlopen", fake_urlopen)
+
+    # Freeze "today"
+    monkeypatch.setattr(providers, "datetime", type("DT", (), {
+        "now": staticmethod(lambda tz=None: datetime(2030, 4, 15, 12, 0, 0, tzinfo=tz or timezone.utc)),
+        "strftime": datetime.strftime,
+    }))
+
+    try:
+        result = providers.get_provider_cost_history("openrouter", days=3)
+    finally:
+        _restore_config(old_cfg, old_mtime)
+
+    # 5 historical + 1 today = 6 total, but window_days=3 returns last 3
+    assert result["ok"] is True
+    assert result["window_days"] == 3
+    assert len(result["snapshots"]) == 3
+    # The returned snapshots are the most recent 3
+    assert result["snapshots"][0]["date"] == "2030-04-13"
+    assert result["snapshots"][1]["date"] == "2030-04-14"
+    assert result["snapshots"][2]["date"] == "2030-04-15"
+
+
+# ── No real network calls ─────────────────────────────────────────────────────
+
+
+def test_cost_history_uses_no_real_network(monkeypatch, tmp_path):
+    """Every test path must monkeypatch urlopen; verify no real calls escape."""
+    monkeypatch.setattr(profiles, "get_active_hermes_home", lambda: tmp_path)
+    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
+    old_cfg, old_mtime = _with_config(model={"provider": "openrouter"})
+
+    import api.providers as providers
+
+    # Without a key, no network call is made at all
+    def explode(*_a, **_kw):
+        raise AssertionError("real network call detected")
+
+    monkeypatch.setattr(providers.urllib.request, "urlopen", explode)
+
+    try:
+        result = providers.get_provider_cost_history("openrouter", days=7)
+    finally:
+        _restore_config(old_cfg, old_mtime)
+
+    assert result["status"] == "no_key"
@@ -7,6 +7,7 @@ Each test is tagged with the sprint/commit where the bug was found and fixed.
 import json
 import os
 import pathlib
+import re
 import time
 import urllib.error
 import urllib.request
@@ -433,7 +434,7 @@ def test_loadSession_inflight_restores_live_tool_cards(cleanup_test_sessions):
    # INFLIGHT branch must call appendLiveToolCard
    inflight_idx = src.find("if(INFLIGHT[sid]){")
    assert inflight_idx >= 0, "INFLIGHT branch not found in loadSession"
-    inflight_block = src[inflight_idx:inflight_idx+900]
+    inflight_block = src[inflight_idx:inflight_idx+1600]
    assert "appendLiveToolCard" in inflight_block,         "loadSession INFLIGHT branch must restore live tool cards via appendLiveToolCard"
    assert "clearLiveToolCards" in inflight_block,         "loadSession INFLIGHT branch must clear old live cards before restoring"

@@ -582,10 +583,23 @@ def test_live_stream_tokens_persist_partial_assistant_for_session_switch(cleanup
        "messages.js must mark the persisted in-flight assistant row so renderMessages can re-anchor it"
    assert "syncInflightAssistantMessage();" in messages_src, \
        "token handler must update INFLIGHT state before checking the active session"
+    token_match = re.search(r"source\.addEventListener\('token',e=>\{(.*?)\n\s*\}\);", messages_src, re.S)
+    assert token_match, "token listener not found"
+    token_fn = token_match.group(1)
+    assert token_fn.find("assistantText+=d.text") < token_fn.find("if(!S.session||S.session.session_id!==activeSid) return;"), (
+        "token events must update the active stream's local state before DOM-only active-session guards"
+    )
+    assert token_fn.find("syncInflightAssistantMessage();") < token_fn.find("if(!S.session||S.session.session_id!==activeSid) return;"), (
+        "token events must persist INFLIGHT state even while another session is selected"
+    )
    assert "assistantRow&&!assistantRow.isConnected" in messages_src, \
        "live stream must drop stale detached assistant DOM references after session switches"
    assert "data-live-assistant" in ui_src, \
        "renderMessages must preserve a live-assistant DOM anchor when rebuilding the thread"
+    assert "snapshotLiveTurnHtmlForSession(activeSid)" in messages_src, \
+        "live turn DOM snapshots should preserve the interleaved timeline across session switches"
+    assert "restoreLiveTurnHtmlForSession(sid)" in (REPO_ROOT / "static/sessions.js").read_text(), \
+        "loadSession should restore the live turn snapshot before replaying flat tool cards"


 def test_inflight_session_state_tracks_live_tool_cards_per_session(cleanup_test_sessions):
@@ -610,15 +624,32 @@ def test_loadSession_inflight_sets_busy_before_renderMessages(cleanup_test_sessi
    src = (REPO_ROOT / "static/sessions.js").read_text()
    inflight_idx = src.find("if(INFLIGHT[sid]){")
    assert inflight_idx >= 0, "INFLIGHT branch not found in loadSession"
-    inflight_block = src[inflight_idx:inflight_idx+700]
+    inflight_block = src[inflight_idx:inflight_idx+1600]
    busy_pos = inflight_block.find("S.busy=true;")
-    render_pos = inflight_block.find("renderMessages();appendThinking();")
+    render_pos = inflight_block.find("renderMessages();")
    assert busy_pos >= 0, "loadSession INFLIGHT branch must set S.busy=true"
    assert render_pos >= 0, "loadSession INFLIGHT branch must call renderMessages()"
    assert busy_pos < render_pos, \
        "loadSession must set S.busy=true before renderMessages() to avoid duplicate tool cards"


+def test_loadSession_inflight_merges_tail_with_persisted_transcript(cleanup_test_sessions):
+    src = (REPO_ROOT / "static/sessions.js").read_text()
+    inflight_idx = src.find("if(INFLIGHT[sid]){")
+    assert inflight_idx >= 0, "INFLIGHT branch not found in loadSession"
+    inflight_block = src[inflight_idx:inflight_idx+1200]
+
+    assert "await _ensureMessagesLoaded(sid);" in inflight_block, (
+        "returning to an active stream should load the persisted transcript before adding the live tail"
+    )
+    assert "_mergeInflightTailMessages(S.messages,inflightMessages)" in inflight_block, (
+        "INFLIGHT messages should be merged as a tail, not replace the full transcript"
+    )
+    assert "function _mergeInflightTailMessages" in src, (
+        "sessions.js should centralize INFLIGHT tail merge logic for regression coverage"
+    )
+
+
 def test_loadSession_inflight_sets_active_stream_before_replaying_live_tool_cards(cleanup_test_sessions):
    """#1715: returning to an active chat must replay persisted tool cards.

@@ -630,7 +661,7 @@ def test_loadSession_inflight_sets_active_stream_before_replaying_live_tool_card
    src = (REPO_ROOT / "static/sessions.js").read_text()
    inflight_idx = src.find("if(INFLIGHT[sid]){")
    assert inflight_idx >= 0, "INFLIGHT branch not found in loadSession"
-    inflight_block = src[inflight_idx:inflight_idx+1000]
+    inflight_block = src[inflight_idx:inflight_idx+1600]
    active_pos = inflight_block.find("S.activeStreamId=activeStreamId;")
    replay_pos = inflight_block.find("appendLiveToolCard(tc);")
    attach_pos = inflight_block.find("attachLiveStream(sid, activeStreamId")
@@ -769,8 +800,8 @@ def test_ui_js_does_not_hide_anchor_segments_that_contain_thinking(cleanup_test_
    compact = src.replace(' ', '').replace('\n', '')
    assert "assistantThinking.set(rawIdx,thinkingText)" in compact, \
        "renderMessages must preserve reasoning text before hiding empty anchor segments"
-    assert "_thinkingActivityNode(thinkingText)" in src, \
-        "thinking-only assistant content should render inside the shared activity dropdown"
+    assert "_thinkingActivityNode(thinkingText, false)" in src, \
+        "thinking-only assistant content should render as a collapsed timeline Thinking card"


 def test_messages_js_live_assistant_segment_reuses_live_turn_wrapper(cleanup_test_sessions):
@@ -66,7 +66,15 @@ def test_replayed_long_task_events_enter_the_same_live_timeline_handlers():

    assert "updateThinking(" in wire_block, "reasoning replay should use the live Thinking card path"
    assert "appendLiveToolCard(tc)" in wire_block, "tool replay should use live tool-card rendering"
-    assert "setCompressionUi({" in wire_block, "compression replay should use the compression card path"
+    # Compression replay must dispatch through setCompressionUi(...). The handler
+    # body may build the state object inline (`setCompressionUi({...})`) or hoist
+    # it into a `state` variable first (`setCompressionUi(state)`) — both forms
+    # use the same compression-card path, so accept either. Pinning the literal
+    # `{` after the open-paren was over-specific and broke in v0.51.76 when
+    # PR #2347 hoisted the state object to share it with `appendLiveCompressionCard`.
+    assert ("setCompressionUi({" in wire_block) or ("setCompressionUi(state)" in wire_block), (
+        "compression replay should use the compression card path"
+    )
    assert "_runJournalReplayParams()" in MESSAGES_SRC, "replay attachments should enter _wireSSE via EventSource"


@@ -874,3 +874,139 @@ class TestCredentialPoolBackwardCompat(unittest.TestCase):
        # Agent was constructed successfully
        self.assertIn("session_id", captured["init_kwargs"])
        self.assertEqual(captured["init_kwargs"]["session_id"], "sess-compat-test")
+
+
+class TestAgentCacheCredentialPoolStability(unittest.TestCase):
+    """Credential-pool token churn must not evict cached WebUI agents."""
+
+    def test_credential_pool_signature_ignores_volatile_runtime_token(self):
+        import api.streaming as streaming
+
+        pool = object()
+        self.assertEqual(
+            streaming._agent_cache_api_key_sig('token-a', pool),
+            streaming._agent_cache_api_key_sig('token-b', pool),
+        )
+        self.assertNotEqual(
+            streaming._agent_cache_api_key_sig('token-a', None),
+            streaming._agent_cache_api_key_sig('token-b', None),
+        )
+
+    def test_cached_agent_runtime_refresh_swaps_key_without_losing_agent_state(self):
+        import api.streaming as streaming
+
+        class FakeAgent:
+            def __init__(self):
+                self.api_key = 'old-token'
+                self.base_url = 'https://chatgpt.com/backend-api/codex'
+                self.api_mode = 'codex_responses'
+                self._client_kwargs = {
+                    'api_key': 'old-token',
+                    'base_url': self.base_url,
+                    'default_headers': {'old': 'header'},
+                }
+                self._credential_pool = 'old-pool'
+                self.context_compressor = type('Compressor', (), {
+                    'base_url': self.base_url,
+                    'api_key': 'old-token',
+                })()
+                self._primary_runtime = {
+                    'base_url': self.base_url,
+                    'api_key': 'old-token',
+                    'client_kwargs': dict(self._client_kwargs),
+                    'compressor_base_url': self.base_url,
+                    'compressor_api_key': 'old-token',
+                }
+                self.header_refreshes = []
+                self.replacements = []
+                self.prefetch_survives = object()
+
+            def _apply_client_headers_for_base_url(self, base_url):
+                self.header_refreshes.append((base_url, self._client_kwargs['api_key']))
+                self._client_kwargs['default_headers'] = {'refreshed-for': self._client_kwargs['api_key']}
+
+            def _replace_primary_openai_client(self, *, reason):
+                self.replacements.append(reason)
+                return True
+
+        agent = FakeAgent()
+        preserved = agent.prefetch_survives
+        changed = streaming._refresh_cached_agent_runtime(agent, {
+            'api_key': 'new-token',
+            'base_url': 'https://chatgpt.com/backend-api/codex',
+            'credential_pool': 'new-pool',
+        })
+
+        self.assertTrue(changed)
+        self.assertIs(agent.prefetch_survives, preserved)
+        self.assertEqual(agent.api_key, 'new-token')
+        self.assertEqual(agent._client_kwargs['api_key'], 'new-token')
+        self.assertEqual(agent._credential_pool, 'new-pool')
+        self.assertEqual(agent._primary_runtime['api_key'], 'new-token')
+        self.assertEqual(agent._primary_runtime['client_kwargs']['api_key'], 'new-token')
+        self.assertEqual(agent._primary_runtime['compressor_api_key'], 'new-token')
+        self.assertEqual(getattr(agent.context_compressor, 'api_key'), 'new-token')
+        self.assertEqual(agent.header_refreshes, [('https://chatgpt.com/backend-api/codex', 'new-token')])
+        self.assertEqual(agent.replacements, ['webui_credential_refresh'])
+
+    def test_same_key_refresh_repairs_stale_primary_runtime_snapshot(self):
+        import api.streaming as streaming
+
+        class FakeAgent:
+            api_key = 'current-token'
+            base_url = 'https://chatgpt.com/backend-api/codex'
+            api_mode = 'codex_responses'
+            _client_kwargs = {
+                'api_key': 'current-token',
+                'base_url': 'https://chatgpt.com/backend-api/codex',
+            }
+            _primary_runtime = {
+                'api_key': 'old-token',
+                'base_url': 'https://chatgpt.com/backend-api/codex',
+                'client_kwargs': {
+                    'api_key': 'old-token',
+                    'base_url': 'https://chatgpt.com/backend-api/codex',
+                },
+            }
+
+        agent = FakeAgent()
+        ok = streaming._refresh_cached_agent_runtime(agent, {'api_key': 'current-token'})
+
+        self.assertTrue(ok)
+        self.assertEqual(agent._primary_runtime['api_key'], 'current-token')
+        self.assertEqual(agent._primary_runtime['client_kwargs']['api_key'], 'current-token')
+
+    def test_fallback_active_refresh_requests_rebuild_without_mutating_fallback(self):
+        import api.streaming as streaming
+
+        class FakeAgent:
+            api_key = 'fallback-token'
+            base_url = 'https://fallback.example/v1'
+            api_mode = 'codex_responses'
+            _fallback_activated = True
+            _client_kwargs = {
+                'api_key': 'fallback-token',
+                'base_url': 'https://fallback.example/v1',
+            }
+            _primary_runtime = {
+                'api_key': 'old-primary-token',
+                'base_url': 'https://chatgpt.com/backend-api/codex',
+                'client_kwargs': {
+                    'api_key': 'old-primary-token',
+                    'base_url': 'https://chatgpt.com/backend-api/codex',
+                },
+                'compressor_api_key': 'old-primary-token',
+                'compressor_base_url': 'https://chatgpt.com/backend-api/codex',
+            }
+
+        agent = FakeAgent()
+        ok = streaming._refresh_cached_agent_runtime(agent, {
+            'api_key': 'new-primary-token',
+            'base_url': 'https://chatgpt.com/backend-api/codex',
+        })
+
+        self.assertFalse(ok)
+        self.assertEqual(agent.api_key, 'fallback-token')
+        self.assertEqual(agent._client_kwargs['api_key'], 'fallback-token')
+        self.assertEqual(agent._primary_runtime['api_key'], 'old-primary-token')
+        self.assertEqual(agent._primary_runtime['client_kwargs']['api_key'], 'old-primary-token')
@@ -260,35 +260,49 @@ class TestToolCallGroupingStatic:
            "Thinking echo suppression should remove exact visible assistant snippets from reasoning display."
        )

-    def test_tools_and_thinking_share_one_collapsed_activity_dropdown(self):
+    def test_compact_activity_keeps_thinking_cards_after_session_switch(self):
        ui_min = re.sub(r"\s+", "", UI_JS)
        assert "functionensureActivityGroup(" in ui_min, (
-            "Tool calls and thinking should share one agent-activity disclosure helper."
+            "Tool calls should still use the shared Activity disclosure helper."
        )
        assert "data-agent-activity-group" in UI_JS, (
-            "The shared tools/thinking disclosure needs a stable data-agent-activity-group hook."
-        )
-        assert "agent-activity-thinking" in UI_JS, (
-            "Thinking content should be nested inside the shared activity dropdown, not rendered separately."
+            "The Activity disclosure needs a stable data-agent-activity-group hook."
        )
        render_fn = _function_body(UI_JS, "renderMessages")
        assert "isSimplifiedToolCalling()" in render_fn and "assistantThinking.set(rawIdx, thinkingText)" in render_fn, (
-            "Settled thinking should move into the shared activity dropdown only when Compact tool activity is enabled."
+            "Compact settled transcript rendering should preserve Thinking cards after switching sessions."
+        )
+        assert "_thinkingActivityNode(thinkingText, false)" in render_fn, (
+            "Settled Thinking cards should render as collapsed timeline entries before related tools."
+        )
+        assert "anchorParent.insertBefore(thinkingNode, anchorRow)" in render_fn, (
+            "Settled Thinking cards should appear before their visible assistant process text."
        )
        assert "seg.insertAdjacentHTML('beforeend', _thinkingCardHtml(thinkingText))" in render_fn, (
            "The non-simplified path should preserve standalone settled thinking cards."
        )

-    def test_live_thinking_uses_shared_activity_dropdown_only_when_simplified(self):
+    def test_live_thinking_is_shown_while_still_splitting_tool_bursts(self):
        live_thinking_fn = _function_body(UI_JS, "appendThinking")
+        live_tool_fn = _function_body(UI_JS, "appendLiveToolCard")
+        helper = _function_body(UI_JS, "ensureActivityGroup")
        assert "isSimplifiedToolCalling()" in live_thinking_fn, (
            "Live thinking should branch on the Compact tool activity toggle."
        )
-        assert "ensureActivityGroup" in live_thinking_fn, (
-            "Compact live thinking should be inserted into the shared activity dropdown."
+        assert 'data-live-activity-current' in live_thinking_fn, (
+            "Starting a new live thinking block should close the previous live tool burst."
        )
-        assert "thinkingRow" in live_thinking_fn, (
-            "The non-simplified live thinking path should preserve the upstream #thinkingRow card."
+        assert "body.insertBefore(row, body.firstChild)" not in live_thinking_fn, (
+            "Live thinking should not be moved into the top Activity dropdown."
+        )
+        assert "_thinkingActivityNode(thinkingText, false)" in live_thinking_fn, (
+            "Compact live thinking should render a collapsed Thinking card in the timeline."
+        )
+        assert '[data-live-activity-current="1"]' in live_thinking_fn, (
+            "Starting a new Thinking card should mark the previous live tool burst as no longer current."
+        )
+        assert "body.querySelector" in live_tool_fn and "data-live-tid" in live_tool_fn, (
+            "tool_complete must still update its current live Activity burst by tool id."
        )