Merge pull request #1371 from nesquena/release/v0.50.251

release: v0.50.251
This commit is contained in:
nesquena-hermes
2026-04-30 16:50:44 -07:00
committed by GitHub
17 changed files with 1347 additions and 28 deletions
+9
View File
@@ -2,6 +2,15 @@
## [Unreleased]
## [v0.50.251] — 2026-04-30
### Fixed
- **Sidebar lineage collapse now works for WebUI JSON sessions, not just imported gateway rows** — PR #1358 (v0.50.249) added the client-side lineage-collapse helper but `/api/sessions` only included `_lineage_root_id` for gateway-imported rows. WebUI JSON sessions (the common case) had no grouping key, so cross-surface continuation chains (CLI-close → WebUI continuation, or compression chains within WebUI) still rendered as separate sidebar rows. Now `/api/sessions` reads `parent_session_id` and `end_reason` from `state.db.sessions` for every WebUI session id in the sidebar payload, walks the parent chain when `end_reason in {'compression', 'cli_close'}`, and exposes `_lineage_root_id` + `_compression_segment_count`. Cycle-detected via a `seen` set; depth-bounded to 20 hops to cap pathological data. **Pre-release fix:** swapped the original full-table-scan for a parameterized `WHERE id IN (...)` query that hits PRIMARY KEY + `idx_sessions_parent` — ~50× faster at 1000 rows, scales linearly. **Pre-release fix:** chunked IN clause to 500 vars to stay under SQLITE_MAX_VARIABLE_NUMBER on older sqlite (Python 3.9 ships sqlite 3.31 with default limit 999) — without this a power user with 2000+ sessions in the sidebar would hit `OperationalError: too many SQL variables`, the silent except-wrapper would swallow it, and lineage collapse would never work for them. **Pre-release fix:** tightened `parent_session_id` exposure — only emitted when the parent's `end_reason` is `compression` or `cli_close` (not for `user_stop`/etc), since the frontend's `_sessionLineageKey` falls through to `parent_session_id` and would incorrectly collapse two children of a non-continuation parent into a single row. (`api/agent_sessions.py`, `api/models.py`, `tests/test_session_lineage_metadata_api.py`, `tests/test_pr1370_lineage_metadata_perf_and_orphan.py`, `tests/test_gateway_sync.py`) @dso2ng — PR #1370
- **Manual cron runs persist output and metadata like scheduled runs** — manual WebUI cron runs called `cron.scheduler.run_job(job)` and then only cleared the in-memory running flag. The job's output was dropped (never written via `save_job_output`) and `last_run_at` / `last_status` were never updated. Now the manual-run wrapper (`_run_cron_tracked`) matches the scheduled-cron path at `cron/scheduler.py:1334-1364` exactly: saves output, marks the job complete, treats empty `final_response` as a soft failure (with the same error string), and records failures via `mark_job_run(False, str(e))`. (`api/routes.py`, `tests/test_cron_manual_run_persistence.py`) @NocGeek — PR #1372 (split out from the held #1352 per pre-release feedback)
- **Reasoning trace, tool calls, and partial output preserved on Stop/Cancel** — three distinct data-loss paths fixed: §A reasoning text accumulated in a thread-local `_reasoning_text` was invisible to `cancel_stream()` because it went out of scope when the thread was interrupted; §B live tool calls in thread-local `_live_tool_calls` were similarly lost; §C reasoning-only streams (no visible tokens) produced no partial assistant message because the thinking-block regex strip returned empty string and the `if _stripped:` guard skipped the append. The fix mirrors the existing `STREAM_PARTIAL_TEXT` pattern (#893) by adding two new shared dicts (`STREAM_REASONING_TEXT`, `STREAM_LIVE_TOOL_CALLS`) populated during streaming and read by `cancel_stream()`. The cancel path now appends the partial assistant message when content text, reasoning trace, OR tool calls exist (not just text). Eliminates "paid tokens disappeared" reports on Stop. 8 regression tests covering all three sections plus tools+text combinations. (`api/config.py`, `api/streaming.py`, `tests/test_issue1361_cancel_data_loss.py`) @bergeouss — PR #1375, fixes #1361
- **New profiles route sessions to the profile dir on first use, not back to default**`get_hermes_home_for_profile()` had a `if profile_dir.is_dir(): return profile_dir; return _DEFAULT_HERMES_HOME` fallback. New profiles (no session yet, so no dir) routed every session back to default until the directory existed on disk — making profile switching silently broken for the first session of every new profile. Removed the `is_dir()` guard; the profile path is now returned unconditionally and the directory is created on first use by the agent/session layer. Path traversal is still blocked by the `_PROFILE_ID_RE` regex (`^[a-z0-9][a-z0-9_-]{0,63}$`); R19j tests were updated to pin that the regex is now the sole defense. R19c was tightened to assert the new behavior. 5 regression tests in `test_issue1195_session_profile_routing.py` covering existing-profile, non-existent-profile (the core fix), None, empty-string, and 'default' return paths. (`api/profiles.py`, `tests/test_issue798.py`, `tests/test_issue1195_session_profile_routing.py`) @bergeouss — PR #1373, fixes #1195
## [v0.50.250] — 2026-04-30
### Fixed
+117
View File
@@ -255,3 +255,120 @@ def read_importable_agent_session_rows(
if limit is None:
return projected
return projected[:max(0, int(limit))]
def read_session_lineage_metadata(db_path: Path, session_ids: list[str] | set[str]) -> dict[str, dict]:
"""Return compression-lineage metadata for known WebUI sidebar sessions.
WebUI sessions are persisted as JSON files, but Hermes Agent also mirrors
them into ``state.db.sessions`` for insights/session history. Compression
and cross-surface continuation create parent chains there. ``/api/sessions``
needs to surface that lineage to the sidebar so client-side collapse can
group logical continuations without mutating or deleting any session files.
Missing DBs, old schemas, or incomplete rows degrade to an empty mapping.
"""
wanted = {str(sid) for sid in (session_ids or []) if sid}
db_path = Path(db_path)
if not wanted or not db_path.exists():
return {}
try:
with sqlite3.connect(str(db_path)) as conn:
conn.row_factory = sqlite3.Row
cur = conn.cursor()
cur.execute("PRAGMA table_info(sessions)")
session_cols = {row[1] for row in cur.fetchall()}
if 'parent_session_id' not in session_cols or 'end_reason' not in session_cols:
return {}
# Scoped fetch via PRIMARY KEY + idx_sessions_parent rather than a
# full table scan. The sessions table grows unbounded over time
# (1000+ rows is normal, 10000+ for power users), and this function
# runs on every sidebar refresh — a full SELECT was ~50x slower
# than the indexed lookup at 1000 rows and scales linearly.
#
# Fetch the wanted ids first, then chase parent_session_id chains
# in batches until no new ids appear. Each batch hits PRIMARY KEY
# so it's effectively O(N) lookups.
#
# IN-clause is chunked to 500 to stay under SQLITE_MAX_VARIABLE_NUMBER
# on older sqlite (Python 3.9 ships sqlite 3.31 which defaults to 999;
# newer Python ships sqlite 3.32+ at 32766). On a power user with
# 2000+ sessions in the sidebar, an unchunked first hop would raise
# `OperationalError: too many SQL variables`, get swallowed by the
# except below, and silently disable lineage collapse forever.
# (Opus pre-release review of v0.50.251, SHOULD-FIX 2.)
IN_CHUNK = 500
rows: dict[str, dict] = {}
to_fetch = set(wanted)
# Cap walk depth to bound worst-case query count. Real lineage
# chains seen in production are <10 segments; anything longer is
# almost certainly pathological data and not worth chasing.
for _hop in range(20):
if not to_fetch:
break
fetch_list = list(to_fetch)
to_fetch = set()
for i in range(0, len(fetch_list), IN_CHUNK):
chunk = fetch_list[i:i + IN_CHUNK]
placeholders = ','.join('?' * len(chunk))
cur.execute(
f"SELECT id, parent_session_id, end_reason FROM sessions WHERE id IN ({placeholders})",
chunk,
)
for row in cur.fetchall():
rows[row['id']] = dict(row)
# Queue up parents we haven't fetched yet.
for sid in fetch_list:
parent_id = rows.get(sid, {}).get('parent_session_id')
if parent_id and parent_id not in rows and parent_id not in to_fetch:
to_fetch.add(parent_id)
except Exception:
return {}
metadata: dict[str, dict] = {}
for sid in wanted:
row = rows.get(sid)
if not row:
continue
parent_id = row.get('parent_session_id')
# Only expose parent_session_id when:
# 1) the parent actually exists in state.db (orphan refs would
# otherwise leak through and the frontend would treat them as
# sidebar grouping keys via #1358's _sessionLineageKey
# fall-through)
# 2) the parent's end_reason is one of {compression, cli_close} —
# i.e. only TRUE continuations. Without this, two distinct
# WebUI sessions sharing a `user_stop` parent would get
# collapsed into a single sidebar row by #1358's helper
# (it groups by parent_session_id as the third-fallback key).
# (Opus pre-release review of v0.50.251, SHOULD-FIX 1.)
parent_row = rows.get(parent_id) if parent_id else None
if parent_row and parent_row.get('end_reason') in {'compression', 'cli_close'}:
metadata.setdefault(sid, {})['parent_session_id'] = parent_id
root_id = sid
current_id = sid
segment_count = 1
seen = {sid}
while True:
current = rows.get(current_id)
parent_id = current.get('parent_session_id') if current else None
parent = rows.get(parent_id) if parent_id else None
if not parent or parent_id in seen:
break
if parent.get('end_reason') not in {'compression', 'cli_close'}:
break
root_id = parent_id
current_id = parent_id
seen.add(parent_id)
segment_count += 1
if root_id != sid:
entry = metadata.setdefault(sid, {})
entry['_lineage_root_id'] = root_id
entry['_compression_segment_count'] = segment_count
return metadata
+2
View File
@@ -2115,6 +2115,8 @@ STREAMS_LOCK = threading.Lock()
CANCEL_FLAGS: dict = {}
AGENT_INSTANCES: dict = {} # stream_id -> AIAgent instance for interrupt propagation
STREAM_PARTIAL_TEXT: dict = {} # stream_id -> partial assistant text accumulated during streaming
STREAM_REASONING_TEXT: dict = {} # stream_id -> reasoning trace accumulated during streaming (#1361 §A)
STREAM_LIVE_TOOL_CALLS: dict = {} # stream_id -> live tool calls accumulated during streaming (#1361 §B)
SERVER_START_TIME = time.time()
# Agent cache: reuse AIAgent across messages in the same WebUI session so that
+32 -1
View File
@@ -15,7 +15,7 @@ from api.config import (
get_effective_default_model, _get_session_agent_lock,
)
from api.workspace import get_last_workspace
from api.agent_sessions import read_importable_agent_session_rows
from api.agent_sessions import read_importable_agent_session_rows, read_session_lineage_metadata
logger = logging.getLogger(__name__)
@@ -746,6 +746,31 @@ def _hide_from_default_sidebar(session: dict) -> bool:
return source == 'cron' or sid.startswith('cron_')
def _active_state_db_path() -> Path:
"""Return state.db for the active Hermes profile, degrading to HERMES_HOME."""
try:
from api.profiles import get_active_hermes_home
hermes_home = Path(get_active_hermes_home()).expanduser().resolve()
except Exception:
hermes_home = Path(os.getenv('HERMES_HOME', str(HOME / '.hermes'))).expanduser().resolve()
return hermes_home / 'state.db'
def _enrich_sidebar_lineage_metadata(sessions: list[dict]) -> None:
"""Attach state.db compression lineage metadata used by sidebar collapse."""
try:
metadata = read_session_lineage_metadata(
_active_state_db_path(),
{s.get('session_id') for s in sessions},
)
except Exception:
return
for session in sessions:
sid = session.get('session_id')
if sid in metadata:
session.update(metadata[sid])
def all_sessions():
active_stream_ids = _active_stream_ids()
# Phase C: try index first for O(1) read; fall back to full scan
@@ -804,6 +829,7 @@ def all_sessions():
for s in result:
if not s.get('profile'):
s['profile'] = 'default'
_enrich_sidebar_lineage_metadata(result)
return result
except Exception:
logger.debug("Failed to load session index, falling back to full scan")
@@ -832,6 +858,7 @@ def all_sessions():
for s in result:
if not s.get('profile'):
s['profile'] = 'default'
_enrich_sidebar_lineage_metadata(result)
return result
@@ -1015,6 +1042,10 @@ def get_cli_sessions() -> list:
'raw_source': row.get('raw_source'),
'session_source': row.get('session_source'),
'source_label': row.get('source_label'),
'parent_session_id': row.get('parent_session_id'),
'_lineage_root_id': row.get('_lineage_root_id'),
'_lineage_tip_id': row.get('_lineage_tip_id'),
'_compression_segment_count': row.get('_compression_segment_count'),
'is_cli_session': True,
})
except Exception as _cli_err:
+2 -4
View File
@@ -150,12 +150,10 @@ def get_hermes_home_for_profile(name: str) -> Path:
empty, 'default', or does not match the profile-name format (rejects path
traversal such as '../../etc').
"""
if not name or name == 'default' or not _PROFILE_ID_RE.match(name):
if not name or name == 'default' or not _PROFILE_ID_RE.fullmatch(name):
return _DEFAULT_HERMES_HOME
profile_dir = _DEFAULT_HERMES_HOME / 'profiles' / name
if profile_dir.is_dir():
return profile_dir
return _DEFAULT_HERMES_HOME
return profile_dir
_TERMINAL_ENV_MAPPINGS = {
+20 -2
View File
@@ -95,10 +95,28 @@ def _cron_output_content_window(text: str, limit: int = _CRON_OUTPUT_CONTENT_LIM
def _run_cron_tracked(job):
"""Wrapper that tracks running state around cron.scheduler.run_job."""
from cron.scheduler import run_job # import here — runs inside a worker thread
from cron.jobs import mark_job_run, save_job_output
job_id = job.get("id", "")
try:
run_job(job)
success, output, final_response, error = run_job(job)
save_job_output(job_id, output)
# Match the scheduled cron path: an apparently successful run with no
# final response should not leave the job looking healthy.
if success and not final_response:
success = False
error = "Agent completed but produced empty response (model error, timeout, or misconfiguration)"
mark_job_run(job_id, success, error)
except Exception as e:
logger.exception("Manual cron run failed for job %s", job_id)
try:
mark_job_run(job_id, False, str(e))
except Exception:
logger.debug("Failed to mark manual cron run failure for %s", job_id)
finally:
_mark_cron_done(job.get("id", ""))
_mark_cron_done(job_id)
_PROVIDER_ALIASES = {
"claude": "anthropic",
+79 -11
View File
@@ -21,6 +21,7 @@ logger = logging.getLogger(__name__)
from api.config import (
STREAMS, STREAMS_LOCK, CANCEL_FLAGS, AGENT_INSTANCES, STREAM_PARTIAL_TEXT,
STREAM_REASONING_TEXT, STREAM_LIVE_TOOL_CALLS,
LOCK, SESSIONS, SESSION_DIR,
_get_session_agent_lock, _set_thread_env, _clear_thread_env,
SESSION_AGENT_LOCKS, SESSION_AGENT_LOCKS_LOCK,
@@ -1373,6 +1374,8 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
with STREAMS_LOCK:
CANCEL_FLAGS[stream_id] = cancel_event
STREAM_PARTIAL_TEXT[stream_id] = '' # start accumulating partial text (#893)
STREAM_REASONING_TEXT[stream_id] = '' # start accumulating reasoning trace (#1361 §A)
STREAM_LIVE_TOOL_CALLS[stream_id] = [] # start accumulating tool calls (#1361 §B)
# Register this stream with the global streaming meter
meter().begin_session(stream_id)
@@ -1575,6 +1578,9 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
if text is None:
return
_reasoning_text += str(text)
# Mirror to shared dict so cancel_stream() can persist it (#1361 §A)
if stream_id in STREAM_REASONING_TEXT:
STREAM_REASONING_TEXT[stream_id] += str(text)
put('reasoning', {'text': str(text)})
# Track reasoning tokens in the meter so TPS reflects all AI output
meter().record_reasoning(stream_id, len(_reasoning_text))
@@ -1607,6 +1613,9 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
reason_text = preview if event_type == 'reasoning.available' else name
if reason_text:
_reasoning_text += str(reason_text)
# Mirror to shared dict so cancel_stream() can persist it (#1361 §A)
if stream_id in STREAM_REASONING_TEXT:
STREAM_REASONING_TEXT[stream_id] += str(reason_text)
put('reasoning', {'text': str(reason_text)})
meter().record_reasoning(stream_id, len(_reasoning_text))
_emit_metering()
@@ -1623,6 +1632,13 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
'name': name,
'args': args if isinstance(args, dict) else {},
})
# Mirror to shared dict so cancel_stream() can persist it (#1361 §B)
if stream_id in STREAM_LIVE_TOOL_CALLS:
STREAM_LIVE_TOOL_CALLS[stream_id].append({
'name': name,
'args': args if isinstance(args, dict) else {},
'done': False,
})
put('tool', {
'event_type': event_type or 'tool.started',
'name': name,
@@ -1651,6 +1667,16 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
live_tc['duration'] = cb_kwargs.get('duration')
live_tc['is_error'] = bool(cb_kwargs.get('is_error', False))
break
# Mirror done state to shared dict (#1361 §B)
if stream_id in STREAM_LIVE_TOOL_CALLS:
for shared_tc in reversed(STREAM_LIVE_TOOL_CALLS[stream_id]):
if shared_tc.get('done'):
continue
if not name or shared_tc.get('name') == name:
shared_tc['done'] = True
shared_tc['duration'] = cb_kwargs.get('duration')
shared_tc['is_error'] = bool(cb_kwargs.get('is_error', False))
break
# Signal the checkpoint thread that new work has completed (Issue #765).
# Each completed tool call is a meaningful unit of progress worth persisting.
_checkpoint_activity[0] += 1
@@ -2443,6 +2469,8 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
CANCEL_FLAGS.pop(stream_id, None)
AGENT_INSTANCES.pop(stream_id, None) # Clean up agent instance reference
STREAM_PARTIAL_TEXT.pop(stream_id, None) # Clean up partial text buffer (#893)
STREAM_REASONING_TEXT.pop(stream_id, None) # Clean up reasoning trace (#1361 §A)
STREAM_LIVE_TOOL_CALLS.pop(stream_id, None) # Clean up tool calls (#1361 §B)
# ============================================================
# SECTION: HTTP Request Handler
@@ -2630,6 +2658,17 @@ def cancel_stream(stream_id: str) -> bool:
live_partials = getattr(_live_config, 'STREAM_PARTIAL_TEXT', partial_texts)
if live_partials is not partial_texts:
_cancel_partial_text = live_partials.get(stream_id, '')
# Capture reasoning trace and live tool calls (#1361 §A + §B)
_cancel_reasoning = STREAM_REASONING_TEXT.get(stream_id, '')
if not _cancel_reasoning:
live_reasoning = getattr(_live_config, 'STREAM_REASONING_TEXT', STREAM_REASONING_TEXT)
if live_reasoning is not STREAM_REASONING_TEXT:
_cancel_reasoning = live_reasoning.get(stream_id, '')
_cancel_tool_calls = STREAM_LIVE_TOOL_CALLS.get(stream_id, [])
if not _cancel_tool_calls:
live_tools = getattr(_live_config, 'STREAM_LIVE_TOOL_CALLS', STREAM_LIVE_TOOL_CALLS)
if live_tools is not STREAM_LIVE_TOOL_CALLS:
_cancel_tool_calls = live_tools.get(stream_id, [])
# Session cleanup outside STREAMS_LOCK to preserve lock ordering.
# Acquire the per-session _agent_lock too, mirroring every other session
@@ -2708,7 +2747,13 @@ def cancel_stream(stream_id: str) -> bool:
# content IS kept in the history sent to the agent on the next user
# message, letting the model continue from where it was cut off.
# See the inner comment on the append call below for the rationale.
#
# #1361: Also persist reasoning trace and live tool calls that were
# accumulated in thread-local variables but invisible to the cancel path.
# This prevents paid-token data loss when cancelling mid-reasoning or
# mid-tool-execution.
partial_text = _cancel_partial_text.strip() if _cancel_partial_text else ''
_stripped = ''
if partial_text:
import re as _re
# Strip thinking/reasoning markup from partial content before saving.
@@ -2721,17 +2766,40 @@ def cancel_stream(stream_id: str) -> bool:
_stripped = _re.sub(r'<think(?:ing)?\b[^>]*>.*',
'', _stripped,
flags=_re.DOTALL | _re.IGNORECASE).strip()
if _stripped:
# Mark _partial=True for session/UI identification only.
# Deliberately NOT _error=True — the partial content is real model
# output and should be visible in conversation history so the model
# can continue from it on the next turn (#893).
_cs.messages.append({
'role': 'assistant',
'content': _stripped,
'_partial': True,
'timestamp': int(time.time()),
})
# Determine whether there is anything to preserve beyond just the
# cancel marker. Content text, reasoning trace, or tool calls all
# count (#1361 §C — previously only _stripped was checked, so a
# reasoning-only or tool-only stream produced NO partial message).
_has_reasoning = bool(_cancel_reasoning and _cancel_reasoning.strip())
_has_tools = bool(_cancel_tool_calls)
if _stripped or _has_reasoning or _has_tools:
_partial_msg: dict = {
'role': 'assistant',
'content': _stripped, # may be empty for reasoning/tool-only turns
'_partial': True,
'timestamp': int(time.time()),
}
if _has_reasoning:
_partial_msg['reasoning'] = _cancel_reasoning.strip()
if _has_tools:
# NOTE: store under the private '_partial_tool_calls' key
# (NOT 'tool_calls'). The captured entries use the WebUI
# internal shape {name, args, done, duration, is_error}
# — they do NOT carry the OpenAI/Anthropic API id +
# function: {name, arguments} envelope. If we put them
# under 'tool_calls', `_sanitize_messages_for_api`
# (which whitelists 'tool_calls' via _API_SAFE_MSG_KEYS)
# would forward them to the next-turn LLM call and
# strict providers (OpenAI, Anthropic, Z.AI/GLM) would
# 400 on the malformed entries — turning a "data lost
# on cancel" bug into a "next message returns 400"
# bug, which is worse. The underscore-prefixed key is
# not in the whitelist, so sanitize strips it. The UI
# reads it via static/messages.js and renders it
# alongside the regular tool_calls path.
# (Opus pre-release review pass 2 of v0.50.251.)
_partial_msg['_partial_tool_calls'] = list(_cancel_tool_calls)
_cs.messages.append(_partial_msg)
# Cancel marker — flagged _error=True so it is stripped from conversation
# history on the next turn (prevents model from seeing "Task cancelled."
# as a prior assistant reply).
+23
View File
@@ -0,0 +1,23 @@
# Upstream Issues — Root Cause Analysis
## #1256: Browser tools fail with "Playwright not installed"
### Root Cause
The check lives in **hermes-agent** (upstream), not hermes-webui:
```
hermes-agent/tools/browser_tool.py → check_browser_requirements()
```
`check_browser_requirements()` does not recognize CDP (Chrome DevTools Protocol) mode — it only looks for a local Playwright/Puppeteer install. When the agent runs in CDP mode (connecting to an existing browser), the check still fails.
### WebUI side
The WebUI already passes `CLI_TOOLSETS` correctly per-request. The `enabled_toolsets` field in the cron/chat config is dynamic and works as intended.
### Fix required
The fix must happen in `hermes-agent/tools/browser_tool.py`:
- `check_browser_requirements()` should skip the Playwright check when CDP mode is configured
- Or add a `BROWSER_MODE=cdp` env var that bypasses the local browser requirement
### Workaround
Use `CLOUD_BROWSER=true` or configure `browser.base_url` to point to a remote CDP endpoint. This bypasses the local Playwright requirement.
+6 -1
View File
@@ -1072,9 +1072,14 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
S.session=session;S.messages=(session.messages||[]).filter(m=>m&&m.role);
const hasMessageToolMetadata=S.messages.some(m=>{
if(!m||m.role!=='assistant') return false;
// Recognize both the standard `tool_calls` (used by completed assistant
// turns where the LLM emitted tool_call entries) and the WebUI-internal
// `_partial_tool_calls` (used on Stop/Cancel partial messages — see
// api/streaming.py cancel_stream).
const hasTc=Array.isArray(m.tool_calls)&&m.tool_calls.length>0;
const hasPartialTc=Array.isArray(m._partial_tool_calls)&&m._partial_tool_calls.length>0;
const hasTu=Array.isArray(m.content)&&m.content.some(p=>p&&p.type==='tool_use');
return hasTc||hasTu;
return hasTc||hasPartialTc||hasTu;
});
if(!hasMessageToolMetadata&&session.tool_calls&&session.tool_calls.length){
S.toolCalls=(session.tool_calls||[]).map(tc=>({...tc,done:true}));
+69
View File
@@ -0,0 +1,69 @@
"""Regression tests for manual WebUI cron runs."""
import sys
import types
def test_manual_cron_run_saves_output_and_marks_job(monkeypatch):
import api.routes as routes
calls = []
cron_pkg = types.ModuleType("cron")
cron_pkg.__path__ = []
cron_jobs = types.ModuleType("cron.jobs")
cron_jobs.save_job_output = lambda job_id, output: calls.append(
("save", job_id, output)
)
cron_jobs.mark_job_run = lambda job_id, success, error=None: calls.append(
("mark", job_id, success, error)
)
cron_scheduler = types.ModuleType("cron.scheduler")
cron_scheduler.run_job = lambda job: (True, "manual output", "done", None)
monkeypatch.setitem(sys.modules, "cron", cron_pkg)
monkeypatch.setitem(sys.modules, "cron.jobs", cron_jobs)
monkeypatch.setitem(sys.modules, "cron.scheduler", cron_scheduler)
routes._mark_cron_running("job123")
routes._run_cron_tracked({"id": "job123"})
assert calls == [
("save", "job123", "manual output"),
("mark", "job123", True, None),
]
assert routes._is_cron_running("job123") == (False, 0.0)
def test_manual_cron_run_marks_empty_response_as_failure(monkeypatch):
import api.routes as routes
calls = []
cron_pkg = types.ModuleType("cron")
cron_pkg.__path__ = []
cron_jobs = types.ModuleType("cron.jobs")
cron_jobs.save_job_output = lambda job_id, output: calls.append(
("save", job_id, output)
)
cron_jobs.mark_job_run = lambda job_id, success, error=None: calls.append(
("mark", job_id, success, error)
)
cron_scheduler = types.ModuleType("cron.scheduler")
cron_scheduler.run_job = lambda job: (True, "manual output", "", None)
monkeypatch.setitem(sys.modules, "cron", cron_pkg)
monkeypatch.setitem(sys.modules, "cron.jobs", cron_jobs)
monkeypatch.setitem(sys.modules, "cron.scheduler", cron_scheduler)
routes._mark_cron_running("job-empty")
routes._run_cron_tracked({"id": "job-empty"})
assert calls[0] == ("save", "job-empty", "manual output")
assert calls[1][0:3] == ("mark", "job-empty", False)
assert "empty response" in calls[1][3]
assert routes._is_cron_running("job-empty") == (False, 0.0)
+3
View File
@@ -333,6 +333,9 @@ def test_compression_chain_collapses_to_latest_tip_in_sidebar():
# bubbles to the top by true recency, not by the root's stale activity.
# tip messages are at t0+201 and t0+202, so last_activity = t0 + 202.
assert abs(tip.get('updated_at') - (t0 + 202)) < 0.01
assert tip.get('_lineage_root_id') == 'chain_root_001'
assert tip.get('_lineage_tip_id') == 'chain_tip_001'
assert tip.get('_compression_segment_count') == 3
from api.agent_sessions import read_importable_agent_session_rows
@@ -0,0 +1,81 @@
"""Tests for issue #1195: sessions must route to the correct profile directory
even when that profile directory does not exist yet on disk."""
import os
import tempfile
from pathlib import Path
from unittest.mock import patch
import pytest
# ── helpers ──────────────────────────────────────────────────────────────────
def _make_hermes_home(base: Path, profile_name: str | None = None) -> Path:
"""Create a temp HERMES_HOME (with optional profile dir) and return it."""
hermes_home = base / ".hermes"
hermes_home.mkdir(parents=True, exist_ok=True)
if profile_name:
(hermes_home / "profiles" / profile_name).mkdir(parents=True, exist_ok=True)
return hermes_home
# ── tests ────────────────────────────────────────────────────────────────────
class TestGetHermesHomeForProfile:
"""get_hermes_home_for_profile() must return the profile path regardless of
whether the directory already exists on disk (#1195)."""
@pytest.fixture(autouse=True)
def _patch_default_home(self, tmp_path):
"""Patch _DEFAULT_HERMES_HOME to a temp directory for isolation."""
from api.profiles import _DEFAULT_HERMES_HOME as real_default
fake_home = tmp_path / ".hermes"
fake_home.mkdir(parents=True)
with patch("api.profiles._DEFAULT_HERMES_HOME", fake_home):
yield fake_home, real_default
def test_existing_profile_returns_profile_dir(self, _patch_default_home):
fake_home, _ = _patch_default_home
from api.profiles import get_hermes_home_for_profile
# Create an existing profile directory
profile_dir = fake_home / "profiles" / "ayan"
profile_dir.mkdir(parents=True)
result = get_hermes_home_for_profile("ayan")
assert result == profile_dir
def test_nonexistent_profile_still_returns_profile_path(self, _patch_default_home):
"""Core bug fix: profile dir doesn't exist yet but should still route there."""
fake_home, _ = _patch_default_home
from api.profiles import get_hermes_home_for_profile
# Do NOT create the profile directory
expected = fake_home / "profiles" / "newprofile"
assert not expected.exists() # confirm it doesn't exist
result = get_hermes_home_for_profile("newprofile")
assert result == expected, "Should route to profile path even when dir missing"
def test_none_returns_default(self, _patch_default_home):
fake_home, _ = _patch_default_home
from api.profiles import get_hermes_home_for_profile
result = get_hermes_home_for_profile(None)
assert result == fake_home
def test_empty_string_returns_default(self, _patch_default_home):
fake_home, _ = _patch_default_home
from api.profiles import get_hermes_home_for_profile
result = get_hermes_home_for_profile("")
assert result == fake_home
def test_default_string_returns_default(self, _patch_default_home):
fake_home, _ = _patch_default_home
from api.profiles import get_hermes_home_for_profile
result = get_hermes_home_for_profile("default")
assert result == fake_home
+317
View File
@@ -0,0 +1,317 @@
"""Regression tests for #1361 — Stop/Cancel discards already-streamed content.
Three distinct data-loss paths on cancel:
§A Reasoning text accumulated in a thread-local `_reasoning_text` is never
visible to cancel_stream(), so it's lost on cancel.
§B Live tool calls accumulated in thread-local `_live_tool_calls` are lost
on cancel only STREAM_PARTIAL_TEXT is captured.
§C When the entire streamed output is reasoning (no visible tokens),
_stripped is empty after regex cleanup, so NO partial assistant message
is appended only the *Task cancelled.* marker survives.
All three fix the same "tokens-paid-for-data-loss" class of bug.
"""
import pathlib
import queue
import re
import threading
from unittest.mock import Mock, patch
import pytest
import api.config as config
import api.models as models
import api.streaming as streaming
from api.models import Session
from api.streaming import cancel_stream
REPO_ROOT = pathlib.Path(__file__).parent.parent.resolve()
# ── Fixtures ────────────────────────────────────────────────────────────────
@pytest.fixture(autouse=True)
def _isolate_session_dir(tmp_path, monkeypatch):
"""Redirect SESSION_DIR / SESSION_INDEX_FILE to an isolated temp dir."""
session_dir = tmp_path / "sessions"
session_dir.mkdir()
index_file = session_dir / "_index.json"
monkeypatch.setattr(models, "SESSION_DIR", session_dir)
monkeypatch.setattr(models, "SESSION_INDEX_FILE", index_file)
models.SESSIONS.clear()
yield
models.SESSIONS.clear()
@pytest.fixture(autouse=True)
def _isolate_stream_state():
"""Clear all shared streaming dicts before/after each test."""
config.STREAMS.clear()
config.CANCEL_FLAGS.clear()
config.AGENT_INSTANCES.clear()
config.STREAM_PARTIAL_TEXT.clear()
# New shared dicts for §A and §B
if hasattr(config, 'STREAM_REASONING_TEXT'):
config.STREAM_REASONING_TEXT.clear()
if hasattr(config, 'STREAM_LIVE_TOOL_CALLS'):
config.STREAM_LIVE_TOOL_CALLS.clear()
yield
config.STREAMS.clear()
config.CANCEL_FLAGS.clear()
config.AGENT_INSTANCES.clear()
config.STREAM_PARTIAL_TEXT.clear()
if hasattr(config, 'STREAM_REASONING_TEXT'):
config.STREAM_REASONING_TEXT.clear()
if hasattr(config, 'STREAM_LIVE_TOOL_CALLS'):
config.STREAM_LIVE_TOOL_CALLS.clear()
@pytest.fixture(autouse=True)
def _isolate_agent_locks():
config.SESSION_AGENT_LOCKS.clear()
yield
config.SESSION_AGENT_LOCKS.clear()
def _make_session(session_id="cancel_sid_1361",
pending_msg="Help me debug this",
messages=None):
"""Build a session in mid-stream state."""
s = Session(
session_id=session_id,
title="Test Session",
messages=messages or [],
)
s.pending_user_message = pending_msg
s.pending_attachments = []
s.pending_started_at = None
s.active_stream_id = "stream_1361"
s.save()
models.SESSIONS[session_id] = s
return s
def _setup_cancel_state(session_id, stream_id="stream_1361"):
"""Wire up STREAMS/CANCEL_FLAGS/AGENT_INSTANCES for cancel_stream()."""
config.STREAMS[stream_id] = queue.Queue()
config.CANCEL_FLAGS[stream_id] = threading.Event()
mock_agent = Mock()
mock_agent.session_id = session_id
mock_agent.interrupt = Mock()
config.AGENT_INSTANCES[stream_id] = mock_agent
return stream_id, mock_agent
# ── §A: Reasoning text lost on cancel ───────────────────────────────────────
class TestCancelPreservesReasoningText:
"""§A: _reasoning_text is thread-local and invisible to cancel_stream().
After fix: reasoning text should be persisted in a shared dict
(STREAM_REASONING_TEXT) keyed by stream_id, and cancel_stream()
should append it as a 'reasoning' field on the partial assistant message.
"""
def test_cancel_with_reasoning_only_preserves_reasoning(self):
"""Cancel during reasoning phase (no visible tokens) should persist reasoning."""
sid = "test_1361_a1"
stream_id = "stream_a1"
s = _make_session(session_id=sid)
_setup_cancel_state(sid, stream_id)
# Simulate: reasoning was accumulated but no visible tokens
reasoning = "Let me think about this step by step..."
config.STREAM_PARTIAL_TEXT[stream_id] = "" # no visible tokens
if hasattr(config, 'STREAM_REASONING_TEXT'):
config.STREAM_REASONING_TEXT[stream_id] = reasoning
cancel_stream(stream_id)
# Reload and check
s2 = models.SESSIONS[sid]
msgs = s2.messages
# There should be a partial assistant message with reasoning
assistant_msgs = [m for m in msgs if isinstance(m, dict) and m.get('role') == 'assistant']
has_reasoning = any(m.get('reasoning') for m in assistant_msgs)
assert has_reasoning, \
f"Expected reasoning field on partial assistant msg after cancel. Got messages: {assistant_msgs}"
def test_cancel_with_reasoning_and_partial_tokens_preserves_both(self):
"""Cancel mid-stream with both reasoning and some visible tokens."""
sid = "test_1361_a2"
stream_id = "stream_a2"
s = _make_session(session_id=sid)
_setup_cancel_state(sid, stream_id)
reasoning = "Let me analyze the code..."
partial_text = "Based on my analysis, the bug is in the"
config.STREAM_PARTIAL_TEXT[stream_id] = partial_text
if hasattr(config, 'STREAM_REASONING_TEXT'):
config.STREAM_REASONING_TEXT[stream_id] = reasoning
cancel_stream(stream_id)
s2 = models.SESSIONS[sid]
assistant_msgs = [m for m in s2.messages if isinstance(m, dict) and m.get('role') == 'assistant']
# Should have partial content
partial_msgs = [m for m in assistant_msgs if m.get('_partial')]
has_content = any(m.get('content') for m in partial_msgs)
assert has_content, \
f"Expected partial assistant content after cancel. Got: {partial_msgs}"
def test_cancel_without_reasoning_dict_works_as_before(self):
"""If STREAM_REASONING_TEXT doesn't exist yet (pre-fix), cancel still works."""
sid = "test_1361_a3"
stream_id = "stream_a3"
s = _make_session(session_id=sid)
_setup_cancel_state(sid, stream_id)
config.STREAM_PARTIAL_TEXT[stream_id] = "Some partial text"
cancel_stream(stream_id)
s2 = models.SESSIONS[sid]
msgs = s2.messages
# Should have the cancel marker
has_cancel = any(
isinstance(m, dict) and m.get('role') == 'assistant' and m.get('_error')
for m in msgs
)
assert has_cancel, "Cancel marker should always be present"
# ── §B: Tool calls lost on cancel ───────────────────────────────────────────
class TestCancelPreservesToolCalls:
"""§B: _live_tool_calls is thread-local and invisible to cancel_stream().
After fix: tool calls should be persisted in a shared dict
(STREAM_LIVE_TOOL_CALLS) keyed by stream_id, and cancel_stream()
should append them as tool_call entries on the partial assistant message.
"""
def test_cancel_with_tool_calls_preserves_tools(self):
"""Cancel after tool execution should preserve the tool call info."""
sid = "test_1361_b1"
stream_id = "stream_b1"
s = _make_session(session_id=sid)
_setup_cancel_state(sid, stream_id)
config.STREAM_PARTIAL_TEXT[stream_id] = ""
if hasattr(config, 'STREAM_LIVE_TOOL_CALLS'):
config.STREAM_LIVE_TOOL_CALLS[stream_id] = [
{"name": "read_file", "args": {"path": "/tmp/test.py"}, "done": True},
{"name": "terminal", "args": {"command": "ls"}, "done": False},
]
cancel_stream(stream_id)
s2 = models.SESSIONS[sid]
assistant_msgs = [m for m in s2.messages if isinstance(m, dict) and m.get('role') == 'assistant']
has_tools = any(m.get('_partial_tool_calls') or m.get('tool_calls') or m.get('tools') for m in assistant_msgs)
assert has_tools, \
f"Expected _partial_tool_calls on partial assistant msg after cancel. Got: {assistant_msgs}"
def test_cancel_with_tools_and_text_preserves_both(self):
"""Cancel after tools + partial text should keep both."""
sid = "test_1361_b2"
stream_id = "stream_b2"
s = _make_session(session_id=sid)
_setup_cancel_state(sid, stream_id)
config.STREAM_PARTIAL_TEXT[stream_id] = "Here's what I found:"
if hasattr(config, 'STREAM_LIVE_TOOL_CALLS'):
config.STREAM_LIVE_TOOL_CALLS[stream_id] = [
{"name": "web_search", "args": {"query": "test"}, "done": True},
]
cancel_stream(stream_id)
s2 = models.SESSIONS[sid]
assistant_msgs = [m for m in s2.messages if isinstance(m, dict) and m.get('role') == 'assistant']
partial_msgs = [m for m in assistant_msgs if m.get('_partial')]
has_content = any(m.get('content') for m in partial_msgs)
assert has_content, \
f"Expected partial content with tools after cancel. Got: {partial_msgs}"
# ── §C: Empty _stripped skips entire append ─────────────────────────────────
class TestCancelWithReasoningOnlyNoText:
"""§C: When streaming was 100% reasoning (no visible tokens), _stripped is
empty after regex cleanup, so no partial assistant message is appended.
After fix: even when _stripped is empty, if reasoning or tool calls exist,
a partial assistant message should be appended (with no content, but with
reasoning and/or tool_calls fields).
"""
def test_reasoning_only_creates_partial_message(self):
"""Cancel after reasoning-only output should still create a partial msg."""
sid = "test_1361_c1"
stream_id = "stream_c1"
s = _make_session(session_id=sid)
_setup_cancel_state(sid, stream_id)
# Only reasoning, no visible tokens at all
config.STREAM_PARTIAL_TEXT[stream_id] = ""
if hasattr(config, 'STREAM_REASONING_TEXT'):
config.STREAM_REASONING_TEXT[stream_id] = "Deep reasoning here..."
cancel_stream(stream_id)
s2 = models.SESSIONS[sid]
assistant_msgs = [m for m in s2.messages if isinstance(m, dict) and m.get('role') == 'assistant']
# Should NOT be only the cancel marker — there should be a partial msg
partial_msgs = [m for m in assistant_msgs if m.get('_partial')]
assert len(partial_msgs) > 0, \
f"Expected at least one partial assistant msg for reasoning-only cancel. Got: {assistant_msgs}"
def test_tools_only_creates_partial_message(self):
"""Cancel after tool-only output (no text, no reasoning) should still create a partial msg."""
sid = "test_1361_c2"
stream_id = "stream_c2"
s = _make_session(session_id=sid)
_setup_cancel_state(sid, stream_id)
config.STREAM_PARTIAL_TEXT[stream_id] = ""
if hasattr(config, 'STREAM_LIVE_TOOL_CALLS'):
config.STREAM_LIVE_TOOL_CALLS[stream_id] = [
{"name": "read_file", "args": {"path": "/tmp/x"}, "done": True},
]
cancel_stream(stream_id)
s2 = models.SESSIONS[sid]
assistant_msgs = [m for m in s2.messages if isinstance(m, dict) and m.get('role') == 'assistant']
partial_msgs = [m for m in assistant_msgs if m.get('_partial')]
assert len(partial_msgs) > 0, \
f"Expected at least one partial assistant msg for tools-only cancel. Got: {assistant_msgs}"
def test_no_reasoning_no_tools_no_partial(self):
"""Cancel with no reasoning and no tools and no text = only cancel marker (no change)."""
sid = "test_1361_c3"
stream_id = "stream_c3"
s = _make_session(session_id=sid)
_setup_cancel_state(sid, stream_id)
config.STREAM_PARTIAL_TEXT[stream_id] = ""
cancel_stream(stream_id)
s2 = models.SESSIONS[sid]
assistant_msgs = [m for m in s2.messages if isinstance(m, dict) and m.get('role') == 'assistant']
# Should only have the cancel marker, no partial messages
partial_msgs = [m for m in assistant_msgs if m.get('_partial')]
cancel_msgs = [m for m in assistant_msgs if m.get('_error')]
assert len(partial_msgs) == 0, \
f"Expected no partial msg when nothing was streamed. Got partials: {partial_msgs}"
assert len(cancel_msgs) == 1, \
f"Expected exactly 1 cancel marker. Got: {cancel_msgs}"
+31 -9
View File
@@ -40,13 +40,18 @@ def test_get_hermes_home_for_profile_returns_profile_subdir(tmp_path, monkeypatc
assert result == profile_dir
def test_get_hermes_home_for_profile_falls_back_for_missing_profile(tmp_path, monkeypatch):
"""R19c: Named profile that does not exist falls back to base home."""
def test_get_hermes_home_for_profile_returns_profile_path_for_missing_profile(tmp_path, monkeypatch):
"""R19c: Named profile that does not exist on disk now returns the
profile-scoped path (created on first use by the agent layer), NOT the
base home. Tightened in v0.50.251 / PR #1373 to fix #1195: the previous
is_dir() fallback caused new profiles to silently route every session
back to the default profile until the directory existed on disk.
Path traversal is still blocked by the _PROFILE_ID_RE regex (R19j)."""
import api.profiles as p
monkeypatch.setattr(p, '_DEFAULT_HERMES_HOME', tmp_path)
result = p.get_hermes_home_for_profile('ghost')
assert result == tmp_path
assert result == tmp_path / 'profiles' / 'ghost'
def test_get_hermes_home_for_profile_does_not_mutate_globals():
@@ -171,8 +176,10 @@ def test_sessions_js_sends_profile_in_new_session_post():
def test_get_hermes_home_for_profile_rejects_path_traversal():
"""R19j: get_hermes_home_for_profile() must reject names that don't match
_PROFILE_ID_RE (e.g. path traversal like '../../etc') and return the base home.
The regex guard is defence-in-depth on top of the is_dir() fallback."""
_PROFILE_ID_RE (e.g. path traversal like '../../etc') and return the base
home. After v0.50.251 / PR #1373 removed the is_dir() fallback, the regex
is the SOLE guard against path traversal verify each known-bad shape
still returns the base home, not a traversed path."""
import api.profiles as p
base = p._DEFAULT_HERMES_HOME
assert p.get_hermes_home_for_profile('../../etc') == base
@@ -180,7 +187,22 @@ def test_get_hermes_home_for_profile_rejects_path_traversal():
assert p.get_hermes_home_for_profile('/absolute/path') == base
assert p.get_hermes_home_for_profile('has spaces') == base
assert p.get_hermes_home_for_profile('UPPERCASE') == base
# Valid names still work
assert p.get_hermes_home_for_profile('alice') == base # nonexistent → fallback
assert p.get_hermes_home_for_profile('my-profile') == base
assert p.get_hermes_home_for_profile('profile_1') == base
# Valid names now route to the profile-scoped path (created on first use).
# Previously these returned `base` because no profile dir existed on disk.
assert p.get_hermes_home_for_profile('alice') == base / 'profiles' / 'alice'
assert p.get_hermes_home_for_profile('my-profile') == base / 'profiles' / 'my-profile'
assert p.get_hermes_home_for_profile('profile_1') == base / 'profiles' / 'profile_1'
# R19j coverage gaps closed in v0.50.251 per Opus pre-release review:
# - Trailing-newline names must be rejected (re.match would let them through;
# re.fullmatch correctly anchors $). Catches the match-vs-fullmatch footgun.
assert p.get_hermes_home_for_profile('valid\n') == base
assert p.get_hermes_home_for_profile('a\n') == base
# - Length boundaries: 64 chars (max valid: 1 + 63 suffix) routes to profile path,
# 65 chars rejected.
assert p.get_hermes_home_for_profile('a' * 64) == base / 'profiles' / ('a' * 64)
assert p.get_hermes_home_for_profile('a' * 65) == base
# - Single-char name is the minimum valid form.
assert p.get_hermes_home_for_profile('a') == base / 'profiles' / 'a'
# - Non-ASCII / Unicode-trick names are rejected by the ASCII-only charset.
assert p.get_hermes_home_for_profile('voilà') == base
assert p.get_hermes_home_for_profile('名前') == base
@@ -0,0 +1,290 @@
"""Regression tests for the v0.50.251 pre-release fixes to PR #1370.
PR #1370 originally did `SELECT id, parent_session_id, end_reason FROM
sessions` (full table scan) on every call. At ~1000 rows the indexed
lookup was 50x faster; the pre-release fix replaces the full scan with
a parameterized `WHERE id IN (...)` query that hits PRIMARY KEY +
idx_sessions_parent.
Also pins: orphan-parent references (parent_session_id points to a
row that doesn't exist in state.db) must NOT be exposed in the API
output, because #1358's frontend `_sessionLineageKey` falls through to
`parent_session_id` when `_lineage_root_id` is missing and would
group orphans by a key no other session shares (cosmetic dead state).
"""
import os
import sqlite3
import time
import pytest
def _make_db(path):
conn = sqlite3.connect(str(path))
conn.executescript("""
CREATE TABLE sessions (
id TEXT PRIMARY KEY,
source TEXT,
title TEXT,
model TEXT,
started_at REAL NOT NULL,
message_count INTEGER DEFAULT 0,
parent_session_id TEXT,
ended_at REAL,
end_reason TEXT
);
CREATE INDEX idx_sessions_parent ON sessions(parent_session_id);
""")
return conn
def _insert(conn, sid, *, parent=None, end_reason=None, started_at=None):
conn.execute(
"INSERT INTO sessions (id, source, title, model, started_at, parent_session_id, end_reason) "
"VALUES (?, 'webui', ?, 'openai/gpt-5', ?, ?, ?)",
(sid, sid, started_at or time.time(), parent, end_reason),
)
conn.commit()
def test_does_not_full_scan_sessions_table(tmp_path, monkeypatch):
"""The perf-critical invariant: must NOT do a `SELECT * FROM sessions`
or `SELECT id, parent_session_id, end_reason FROM sessions` without a
WHERE clause. Full scans regress sidebar refresh latency on power users
with 1000+ session rows.
"""
from api import agent_sessions
db = tmp_path / "state.db"
conn = _make_db(db)
# Insert 500 unrelated rows + 5 we care about
for i in range(500):
_insert(conn, f"unrelated_{i:04d}")
_insert(conn, "wanted_1")
_insert(conn, "wanted_2", parent="wanted_1", end_reason=None)
conn.close()
# Track every SQL the function issues
queries = []
real_connect = sqlite3.connect
class _TrackingConn:
def __init__(self, *args, **kw):
self._real = real_connect(*args, **kw)
def cursor(self):
return _TrackingCursor(self._real.cursor())
def __enter__(self):
return self
def __exit__(self, *a):
return self._real.__exit__(*a)
@property
def row_factory(self):
return self._real.row_factory
@row_factory.setter
def row_factory(self, v):
self._real.row_factory = v
class _TrackingCursor:
def __init__(self, real):
self._real = real
def execute(self, sql, *args):
queries.append(sql)
return self._real.execute(sql, *args)
def fetchall(self):
return self._real.fetchall()
def fetchone(self):
return self._real.fetchone()
monkeypatch.setattr(sqlite3, "connect", _TrackingConn)
agent_sessions.read_session_lineage_metadata(db, ["wanted_1", "wanted_2"])
# No query may select from sessions without a WHERE clause
bad = [q for q in queries
if "from sessions" in q.lower()
and "where" not in q.lower()
and "pragma" not in q.lower()]
assert not bad, (
f"read_session_lineage_metadata must scope its SELECTs to specific "
f"session ids (PR #1370 originally did a full scan). Found unscoped "
f"queries: {bad}"
)
def test_orphan_parent_reference_not_exposed_in_metadata(tmp_path):
"""If a session row references a parent that doesn't exist in state.db
(orphan), the API output must NOT include `parent_session_id` because
#1358's frontend lineage helper would treat it as a sidebar grouping key
and cluster the orphan into a never-collapsing single-row group.
"""
from api.agent_sessions import read_session_lineage_metadata
db = tmp_path / "state.db"
conn = _make_db(db)
_insert(conn, "child", parent="missing_parent", end_reason=None)
conn.close()
result = read_session_lineage_metadata(db, ["child"])
assert "child" not in result or "parent_session_id" not in result["child"], (
f"Orphan parent_session_id leaked into API output: {result}. "
f"This causes the frontend lineage helper to group the orphan under "
f"a key no other session shares — cosmetic dead state on the wire."
)
def test_cycle_in_parent_chain_terminates(tmp_path):
"""Pathological data with a parent cycle (A→B→A) must not infinite-loop."""
from api.agent_sessions import read_session_lineage_metadata
db = tmp_path / "state.db"
conn = _make_db(db)
_insert(conn, "A", parent="B", end_reason="compression")
_insert(conn, "B", parent="A", end_reason="compression")
conn.close()
import threading
done = threading.Event()
result_box = []
def worker():
result_box.append(read_session_lineage_metadata(db, ["A"]))
done.set()
t = threading.Thread(target=worker, daemon=True)
t.start()
finished = done.wait(timeout=2.0)
assert finished, "read_session_lineage_metadata hung on a parent cycle"
result = result_box[0]
# Cycle should terminate; segment count should be bounded (≤ 2 for A→B→A)
seg = result.get("A", {}).get("_compression_segment_count", 0)
assert 1 <= seg <= 5, f"Cycle should produce a small bounded count, got {seg}"
def test_chain_walk_returns_correct_root_for_real_compression_chain(tmp_path):
"""End-to-end behavioural test: 4-segment compression chain returns the
correct root and a segment count of 4."""
from api.agent_sessions import read_session_lineage_metadata
db = tmp_path / "state.db"
conn = _make_db(db)
_insert(conn, "root", end_reason="compression")
_insert(conn, "seg2", parent="root", end_reason="compression")
_insert(conn, "seg3", parent="seg2", end_reason="compression")
_insert(conn, "tip", parent="seg3", end_reason=None)
conn.close()
result = read_session_lineage_metadata(db, ["tip"])
assert result["tip"]["_lineage_root_id"] == "root"
assert result["tip"]["_compression_segment_count"] == 4
assert result["tip"]["parent_session_id"] == "seg3"
def test_in_clause_chunked_for_large_session_set(tmp_path, monkeypatch):
"""The first hop must chunk the IN clause into batches of <= 500.
Without chunking, a power user with 2000+ sessions in the sidebar would
trigger SQLITE_MAX_VARIABLE_NUMBER on Python 3.9's sqlite 3.31 (default
limit 999), the OperationalError gets swallowed by the except wrapper,
and lineage collapse silently disables forever for that user.
(Opus pre-release review of v0.50.251, SHOULD-FIX 2.)
"""
from api import agent_sessions
db = tmp_path / "state.db"
conn = _make_db(db)
# 1500 unrelated rows
for i in range(1500):
_insert(conn, f"session_{i:04d}")
conn.close()
# Track each query's IN clause variable count
in_counts = []
real_connect = sqlite3.connect
class _TrackingConn:
def __init__(self, *args, **kw):
self._real = real_connect(*args, **kw)
def cursor(self):
return _TrackingCursor(self._real.cursor())
def __enter__(self): return self
def __exit__(self, *a): return self._real.__exit__(*a)
@property
def row_factory(self): return self._real.row_factory
@row_factory.setter
def row_factory(self, v): self._real.row_factory = v
class _TrackingCursor:
def __init__(self, real): self._real = real
def execute(self, sql, *args):
if "IN (" in sql:
in_counts.append(sql.count("?"))
return self._real.execute(sql, *args)
def fetchall(self): return self._real.fetchall()
def fetchone(self): return self._real.fetchone()
monkeypatch.setattr(sqlite3, "connect", _TrackingConn)
# Request 1500 ids (the full set we inserted)
wanted = [f"session_{i:04d}" for i in range(1500)]
agent_sessions.read_session_lineage_metadata(db, wanted)
assert in_counts, "Expected at least one IN-clause query"
over_limit = [n for n in in_counts if n > 999]
assert not over_limit, (
f"IN clause must be chunked to <= 999 vars to stay under SQLite default "
f"limit (chunked to 500 in the implementation). Found queries with "
f"{over_limit} variables — would raise OperationalError on older sqlite."
)
def test_two_children_sharing_non_continuation_parent_not_collapsed(tmp_path):
"""The contract Opus flagged: when two distinct WebUI sessions share the
same parent and that parent's end_reason is NOT compression/cli_close
(e.g. 'user_stop'), neither child should expose parent_session_id.
The frontend's #1358 lineage helper would otherwise group them under
the parent's id key and incorrectly collapse the rows.
"""
from api.agent_sessions import read_session_lineage_metadata
db = tmp_path / "state.db"
conn = _make_db(db)
_insert(conn, "shared_parent", end_reason="user_stop")
_insert(conn, "child_a", parent="shared_parent", end_reason=None)
_insert(conn, "child_b", parent="shared_parent", end_reason=None)
conn.close()
result = read_session_lineage_metadata(db, ["child_a", "child_b"])
# Neither child should expose parent_session_id (would let frontend
# group them under the same key and collapse one)
for sid in ["child_a", "child_b"]:
entry = result.get(sid, {})
assert "parent_session_id" not in entry, (
f"{sid} leaked parent_session_id despite parent being non-continuation"
)
def test_non_compression_parent_does_not_extend_lineage(tmp_path):
"""If parent's end_reason is something OTHER than 'compression' or
'cli_close' (e.g. 'user_stop', 'session_reset', 'cron_complete'),
the chain stops at that boundary AND parent_session_id is not exposed
(tightened in v0.50.251 per Opus SHOULD-FIX 1 exposing the parent
ref would let the frontend group two children sharing the same
non-continuation parent into a single sidebar row).
"""
from api.agent_sessions import read_session_lineage_metadata
db = tmp_path / "state.db"
conn = _make_db(db)
_insert(conn, "parent", end_reason="user_stop") # NOT compression
_insert(conn, "child", parent="parent", end_reason=None)
conn.close()
result = read_session_lineage_metadata(db, ["child"])
# parent_session_id must NOT be exposed (parent's end_reason is non-continuation)
assert "child" not in result or "parent_session_id" not in result["child"], (
"parent_session_id leaked through despite parent's end_reason being "
"'user_stop' — would cause incorrect sidebar collapse."
)
# _lineage_root_id should NOT be set — chain doesn't span the boundary
assert "child" not in result or "_lineage_root_id" not in result["child"]
assert "child" not in result or "_compression_segment_count" not in result["child"]
@@ -0,0 +1,105 @@
"""Regression test for the v0.50.251 Opus pass-2 MUST-FIX on PR #1375.
Original PR #1375 stored cancelled tool calls under the message key
`tool_calls`. That key is whitelisted by `_API_SAFE_MSG_KEYS` so the
sanitize-for-API path forwarded them to the next-turn LLM call. But the
captured entries use the WebUI internal shape ({name, args, done,
duration, is_error}) they don't have OpenAI/Anthropic's id +
function: {name, arguments} envelope. Strict providers (OpenAI,
Anthropic, Z.AI/GLM) would 400 on the malformed entries turning a
"data lost on cancel" bug into a "next message returns 400" bug.
The fix renames the key to `_partial_tool_calls` (underscore-prefixed
private key NOT in the whitelist), so sanitize correctly strips it.
The UI reads it via static/messages.js.
This test pins the invariant: a partial assistant message with
`_partial_tool_calls` set must produce ZERO `tool_calls` after
sanitize-for-API.
"""
import pathlib
import sys
import pytest
REPO_ROOT = pathlib.Path(__file__).parent.parent.resolve()
sys.path.insert(0, str(REPO_ROOT))
def test_partial_tool_calls_field_not_forwarded_to_llm():
"""The `_partial_tool_calls` field must not survive _sanitize_messages_for_api.
Otherwise the malformed entries get sent to the LLM and cause 400 errors."""
from api.streaming import _sanitize_messages_for_api
messages = [
{"role": "user", "content": "do a search"},
{
"role": "assistant",
"content": "Looking up...",
"_partial": True,
"_partial_tool_calls": [
{"name": "web_search", "args": {"query": "x"}, "done": False},
],
"reasoning": "Let me think about this",
},
]
sanitized = _sanitize_messages_for_api(messages)
# The partial assistant message must NOT have _partial_tool_calls (private key).
# It must NOT have tool_calls (would bypass the rename and 400 the LLM).
# It must NOT have reasoning (not in whitelist).
assistant_msgs = [m for m in sanitized if m.get("role") == "assistant"]
assert assistant_msgs, "Sanitized output must include the assistant message"
for m in assistant_msgs:
assert "_partial_tool_calls" not in m, (
"Sanitize-for-API must strip _partial_tool_calls — it's a UI-only key. "
f"Got: {m}"
)
assert "tool_calls" not in m, (
"Sanitize-for-API must NOT have tool_calls on a partial message — the "
"captured entries use WebUI shape and would 400 on strict providers. "
f"Got: {m}"
)
assert "reasoning" not in m, (
"Sanitize-for-API must strip reasoning (not in _API_SAFE_MSG_KEYS). "
f"Got: {m}"
)
def test_legitimate_tool_calls_are_preserved_for_completed_turns():
"""Completed assistant turns with REAL tool_calls (with id + function envelope)
must still pass through sanitize unchanged. The rename only affects
cancel-partial messages, not normal completed turns."""
from api.streaming import _sanitize_messages_for_api
messages = [
{"role": "user", "content": "search"},
{
"role": "assistant",
"content": "I'll search.",
"tool_calls": [
{
"id": "call_abc",
"type": "function",
"function": {"name": "web_search", "arguments": '{"query":"x"}'}
},
],
},
{
"role": "tool",
"tool_call_id": "call_abc",
"name": "web_search",
"content": "result",
},
]
sanitized = _sanitize_messages_for_api(messages)
assistant_msgs = [m for m in sanitized if m.get("role") == "assistant"]
assert assistant_msgs, "Sanitized output must include the assistant message"
assert assistant_msgs[0].get("tool_calls"), (
"Legitimate tool_calls on completed turns must survive sanitize. "
f"Got: {assistant_msgs[0]}"
)
# Must still have the OpenAI envelope shape
tc = assistant_msgs[0]["tool_calls"][0]
assert "id" in tc and "function" in tc, (
f"tool_calls envelope must be preserved. Got: {tc}"
)
+161
View File
@@ -0,0 +1,161 @@
"""Regression tests for /api/sessions lineage metadata used by sidebar collapse."""
import sqlite3
import time
import pytest
import api.models as models
from api.models import SESSIONS, STREAMS, Session, all_sessions
@pytest.fixture(autouse=True)
def _isolate(tmp_path, monkeypatch):
session_dir = tmp_path / "sessions"
session_dir.mkdir()
index_file = session_dir / "_index.json"
state_db = tmp_path / "state.db"
monkeypatch.setattr(models, "SESSION_DIR", session_dir)
monkeypatch.setattr(models, "SESSION_INDEX_FILE", index_file)
monkeypatch.setattr(models, "_active_state_db_path", lambda: state_db)
SESSIONS.clear()
STREAMS.clear()
yield state_db
SESSIONS.clear()
STREAMS.clear()
def _ensure_state_db(path):
conn = sqlite3.connect(str(path))
conn.executescript(
"""
CREATE TABLE sessions (
id TEXT PRIMARY KEY,
source TEXT,
title TEXT,
model TEXT,
started_at REAL NOT NULL,
message_count INTEGER DEFAULT 0,
parent_session_id TEXT,
ended_at REAL,
end_reason TEXT
);
"""
)
return conn
def _insert_state_row(conn, sid, *, parent=None, ended_at=None, end_reason=None, started_at=None):
conn.execute(
"""
INSERT INTO sessions
(id, source, title, model, started_at, message_count, parent_session_id, ended_at, end_reason)
VALUES (?, 'webui', ?, 'openai/gpt-5', ?, 2, ?, ?, ?)
""",
(sid, sid, started_at or time.time(), parent, ended_at, end_reason),
)
conn.commit()
def _save_webui_session(sid, *, title, updated_at):
session = Session(
session_id=sid,
title=title,
messages=[{"role": "user", "content": "hello"}, {"role": "assistant", "content": "hi"}],
updated_at=updated_at,
)
session.save(touch_updated_at=False)
return session
def test_all_sessions_exposes_state_db_lineage_metadata_for_webui_json_sessions(_isolate):
"""PR #1358 can only collapse rows when /api/sessions exposes lineage keys."""
conn = _ensure_state_db(_isolate)
t0 = time.time() - 100
try:
_save_webui_session("lineage_api_root", title="Hermes WebUI", updated_at=t0)
_save_webui_session("lineage_api_tip", title="Hermes WebUI #2", updated_at=t0 + 10)
_insert_state_row(
conn,
"lineage_api_root",
started_at=t0,
ended_at=t0 + 5,
end_reason="compression",
)
_insert_state_row(
conn,
"lineage_api_tip",
parent="lineage_api_root",
started_at=t0 + 6,
)
rows = {row["session_id"]: row for row in all_sessions()}
assert rows["lineage_api_tip"].get("parent_session_id") == "lineage_api_root"
assert rows["lineage_api_tip"].get("_lineage_root_id") == "lineage_api_root"
assert rows["lineage_api_tip"].get("_compression_segment_count") == 2
assert "_lineage_root_id" not in rows["lineage_api_root"]
finally:
conn.close()
def test_non_compression_state_db_parent_does_not_create_sidebar_lineage(_isolate):
conn = _ensure_state_db(_isolate)
t0 = time.time() - 100
try:
_save_webui_session("lineage_api_plain_parent", title="Parent", updated_at=t0)
_save_webui_session("lineage_api_plain_child", title="Child", updated_at=t0 + 10)
_insert_state_row(
conn,
"lineage_api_plain_parent",
started_at=t0,
ended_at=t0 + 5,
end_reason="user_stop",
)
_insert_state_row(
conn,
"lineage_api_plain_child",
parent="lineage_api_plain_parent",
started_at=t0 + 6,
)
rows = {row["session_id"]: row for row in all_sessions()}
# parent_session_id must NOT be exposed when the parent's end_reason
# is not a continuation (compression / cli_close). The frontend's
# _sessionLineageKey would otherwise group two children sharing a
# `user_stop` parent under the same key — incorrect collapse.
# (Tightened in v0.50.251 per Opus SHOULD-FIX 1.)
assert "parent_session_id" not in rows["lineage_api_plain_child"]
assert "_lineage_root_id" not in rows["lineage_api_plain_child"]
finally:
conn.close()
def test_cli_close_parent_preserves_cross_surface_continuation_lineage(_isolate):
conn = _ensure_state_db(_isolate)
t0 = time.time() - 100
try:
_save_webui_session("lineage_api_cli_parent", title="Hermes WebUI #8", updated_at=t0)
_save_webui_session("lineage_api_webui_child", title="Hermes WebUI #8", updated_at=t0 + 10)
_insert_state_row(
conn,
"lineage_api_cli_parent",
started_at=t0,
ended_at=t0 + 5,
end_reason="cli_close",
)
_insert_state_row(
conn,
"lineage_api_webui_child",
parent="lineage_api_cli_parent",
started_at=t0 + 6,
)
rows = {row["session_id"]: row for row in all_sessions()}
assert rows["lineage_api_webui_child"].get("parent_session_id") == "lineage_api_cli_parent"
assert rows["lineage_api_webui_child"].get("_lineage_root_id") == "lineage_api_cli_parent"
finally:
conn.close()