mirror of
https://github.com/nesquena/hermes-webui.git
synced 2026-05-27 04:00:37 +00:00
Merge pull request #1371 from nesquena/release/v0.50.251
release: v0.50.251
This commit is contained in:
@@ -2,6 +2,15 @@
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
## [v0.50.251] — 2026-04-30
|
||||
|
||||
### Fixed
|
||||
- **Sidebar lineage collapse now works for WebUI JSON sessions, not just imported gateway rows** — PR #1358 (v0.50.249) added the client-side lineage-collapse helper but `/api/sessions` only included `_lineage_root_id` for gateway-imported rows. WebUI JSON sessions (the common case) had no grouping key, so cross-surface continuation chains (CLI-close → WebUI continuation, or compression chains within WebUI) still rendered as separate sidebar rows. Now `/api/sessions` reads `parent_session_id` and `end_reason` from `state.db.sessions` for every WebUI session id in the sidebar payload, walks the parent chain when `end_reason in {'compression', 'cli_close'}`, and exposes `_lineage_root_id` + `_compression_segment_count`. Cycle-detected via a `seen` set; depth-bounded to 20 hops to cap pathological data. **Pre-release fix:** swapped the original full-table-scan for a parameterized `WHERE id IN (...)` query that hits PRIMARY KEY + `idx_sessions_parent` — ~50× faster at 1000 rows, scales linearly. **Pre-release fix:** chunked IN clause to 500 vars to stay under SQLITE_MAX_VARIABLE_NUMBER on older sqlite (Python 3.9 ships sqlite 3.31 with default limit 999) — without this a power user with 2000+ sessions in the sidebar would hit `OperationalError: too many SQL variables`, the silent except-wrapper would swallow it, and lineage collapse would never work for them. **Pre-release fix:** tightened `parent_session_id` exposure — only emitted when the parent's `end_reason` is `compression` or `cli_close` (not for `user_stop`/etc), since the frontend's `_sessionLineageKey` falls through to `parent_session_id` and would incorrectly collapse two children of a non-continuation parent into a single row. (`api/agent_sessions.py`, `api/models.py`, `tests/test_session_lineage_metadata_api.py`, `tests/test_pr1370_lineage_metadata_perf_and_orphan.py`, `tests/test_gateway_sync.py`) @dso2ng — PR #1370
|
||||
- **Manual cron runs persist output and metadata like scheduled runs** — manual WebUI cron runs called `cron.scheduler.run_job(job)` and then only cleared the in-memory running flag. The job's output was dropped (never written via `save_job_output`) and `last_run_at` / `last_status` were never updated. Now the manual-run wrapper (`_run_cron_tracked`) matches the scheduled-cron path at `cron/scheduler.py:1334-1364` exactly: saves output, marks the job complete, treats empty `final_response` as a soft failure (with the same error string), and records failures via `mark_job_run(False, str(e))`. (`api/routes.py`, `tests/test_cron_manual_run_persistence.py`) @NocGeek — PR #1372 (split out from the held #1352 per pre-release feedback)
|
||||
- **Reasoning trace, tool calls, and partial output preserved on Stop/Cancel** — three distinct data-loss paths fixed: §A reasoning text accumulated in a thread-local `_reasoning_text` was invisible to `cancel_stream()` because it went out of scope when the thread was interrupted; §B live tool calls in thread-local `_live_tool_calls` were similarly lost; §C reasoning-only streams (no visible tokens) produced no partial assistant message because the thinking-block regex strip returned empty string and the `if _stripped:` guard skipped the append. The fix mirrors the existing `STREAM_PARTIAL_TEXT` pattern (#893) by adding two new shared dicts (`STREAM_REASONING_TEXT`, `STREAM_LIVE_TOOL_CALLS`) populated during streaming and read by `cancel_stream()`. The cancel path now appends the partial assistant message when content text, reasoning trace, OR tool calls exist (not just text). Eliminates "paid tokens disappeared" reports on Stop. 8 regression tests covering all three sections plus tools+text combinations. (`api/config.py`, `api/streaming.py`, `tests/test_issue1361_cancel_data_loss.py`) @bergeouss — PR #1375, fixes #1361
|
||||
- **New profiles route sessions to the profile dir on first use, not back to default** — `get_hermes_home_for_profile()` had a `if profile_dir.is_dir(): return profile_dir; return _DEFAULT_HERMES_HOME` fallback. New profiles (no session yet, so no dir) routed every session back to default until the directory existed on disk — making profile switching silently broken for the first session of every new profile. Removed the `is_dir()` guard; the profile path is now returned unconditionally and the directory is created on first use by the agent/session layer. Path traversal is still blocked by the `_PROFILE_ID_RE` regex (`^[a-z0-9][a-z0-9_-]{0,63}$`); R19j tests were updated to pin that the regex is now the sole defense. R19c was tightened to assert the new behavior. 5 regression tests in `test_issue1195_session_profile_routing.py` covering existing-profile, non-existent-profile (the core fix), None, empty-string, and 'default' return paths. (`api/profiles.py`, `tests/test_issue798.py`, `tests/test_issue1195_session_profile_routing.py`) @bergeouss — PR #1373, fixes #1195
|
||||
|
||||
|
||||
## [v0.50.250] — 2026-04-30
|
||||
|
||||
### Fixed
|
||||
|
||||
@@ -255,3 +255,120 @@ def read_importable_agent_session_rows(
|
||||
if limit is None:
|
||||
return projected
|
||||
return projected[:max(0, int(limit))]
|
||||
|
||||
|
||||
|
||||
def read_session_lineage_metadata(db_path: Path, session_ids: list[str] | set[str]) -> dict[str, dict]:
|
||||
"""Return compression-lineage metadata for known WebUI sidebar sessions.
|
||||
|
||||
WebUI sessions are persisted as JSON files, but Hermes Agent also mirrors
|
||||
them into ``state.db.sessions`` for insights/session history. Compression
|
||||
and cross-surface continuation create parent chains there. ``/api/sessions``
|
||||
needs to surface that lineage to the sidebar so client-side collapse can
|
||||
group logical continuations without mutating or deleting any session files.
|
||||
|
||||
Missing DBs, old schemas, or incomplete rows degrade to an empty mapping.
|
||||
"""
|
||||
wanted = {str(sid) for sid in (session_ids or []) if sid}
|
||||
db_path = Path(db_path)
|
||||
if not wanted or not db_path.exists():
|
||||
return {}
|
||||
|
||||
try:
|
||||
with sqlite3.connect(str(db_path)) as conn:
|
||||
conn.row_factory = sqlite3.Row
|
||||
cur = conn.cursor()
|
||||
cur.execute("PRAGMA table_info(sessions)")
|
||||
session_cols = {row[1] for row in cur.fetchall()}
|
||||
if 'parent_session_id' not in session_cols or 'end_reason' not in session_cols:
|
||||
return {}
|
||||
# Scoped fetch via PRIMARY KEY + idx_sessions_parent rather than a
|
||||
# full table scan. The sessions table grows unbounded over time
|
||||
# (1000+ rows is normal, 10000+ for power users), and this function
|
||||
# runs on every sidebar refresh — a full SELECT was ~50x slower
|
||||
# than the indexed lookup at 1000 rows and scales linearly.
|
||||
#
|
||||
# Fetch the wanted ids first, then chase parent_session_id chains
|
||||
# in batches until no new ids appear. Each batch hits PRIMARY KEY
|
||||
# so it's effectively O(N) lookups.
|
||||
#
|
||||
# IN-clause is chunked to 500 to stay under SQLITE_MAX_VARIABLE_NUMBER
|
||||
# on older sqlite (Python 3.9 ships sqlite 3.31 which defaults to 999;
|
||||
# newer Python ships sqlite 3.32+ at 32766). On a power user with
|
||||
# 2000+ sessions in the sidebar, an unchunked first hop would raise
|
||||
# `OperationalError: too many SQL variables`, get swallowed by the
|
||||
# except below, and silently disable lineage collapse forever.
|
||||
# (Opus pre-release review of v0.50.251, SHOULD-FIX 2.)
|
||||
IN_CHUNK = 500
|
||||
rows: dict[str, dict] = {}
|
||||
to_fetch = set(wanted)
|
||||
# Cap walk depth to bound worst-case query count. Real lineage
|
||||
# chains seen in production are <10 segments; anything longer is
|
||||
# almost certainly pathological data and not worth chasing.
|
||||
for _hop in range(20):
|
||||
if not to_fetch:
|
||||
break
|
||||
fetch_list = list(to_fetch)
|
||||
to_fetch = set()
|
||||
for i in range(0, len(fetch_list), IN_CHUNK):
|
||||
chunk = fetch_list[i:i + IN_CHUNK]
|
||||
placeholders = ','.join('?' * len(chunk))
|
||||
cur.execute(
|
||||
f"SELECT id, parent_session_id, end_reason FROM sessions WHERE id IN ({placeholders})",
|
||||
chunk,
|
||||
)
|
||||
for row in cur.fetchall():
|
||||
rows[row['id']] = dict(row)
|
||||
# Queue up parents we haven't fetched yet.
|
||||
for sid in fetch_list:
|
||||
parent_id = rows.get(sid, {}).get('parent_session_id')
|
||||
if parent_id and parent_id not in rows and parent_id not in to_fetch:
|
||||
to_fetch.add(parent_id)
|
||||
except Exception:
|
||||
return {}
|
||||
|
||||
metadata: dict[str, dict] = {}
|
||||
for sid in wanted:
|
||||
row = rows.get(sid)
|
||||
if not row:
|
||||
continue
|
||||
|
||||
parent_id = row.get('parent_session_id')
|
||||
# Only expose parent_session_id when:
|
||||
# 1) the parent actually exists in state.db (orphan refs would
|
||||
# otherwise leak through and the frontend would treat them as
|
||||
# sidebar grouping keys via #1358's _sessionLineageKey
|
||||
# fall-through)
|
||||
# 2) the parent's end_reason is one of {compression, cli_close} —
|
||||
# i.e. only TRUE continuations. Without this, two distinct
|
||||
# WebUI sessions sharing a `user_stop` parent would get
|
||||
# collapsed into a single sidebar row by #1358's helper
|
||||
# (it groups by parent_session_id as the third-fallback key).
|
||||
# (Opus pre-release review of v0.50.251, SHOULD-FIX 1.)
|
||||
parent_row = rows.get(parent_id) if parent_id else None
|
||||
if parent_row and parent_row.get('end_reason') in {'compression', 'cli_close'}:
|
||||
metadata.setdefault(sid, {})['parent_session_id'] = parent_id
|
||||
|
||||
root_id = sid
|
||||
current_id = sid
|
||||
segment_count = 1
|
||||
seen = {sid}
|
||||
while True:
|
||||
current = rows.get(current_id)
|
||||
parent_id = current.get('parent_session_id') if current else None
|
||||
parent = rows.get(parent_id) if parent_id else None
|
||||
if not parent or parent_id in seen:
|
||||
break
|
||||
if parent.get('end_reason') not in {'compression', 'cli_close'}:
|
||||
break
|
||||
root_id = parent_id
|
||||
current_id = parent_id
|
||||
seen.add(parent_id)
|
||||
segment_count += 1
|
||||
|
||||
if root_id != sid:
|
||||
entry = metadata.setdefault(sid, {})
|
||||
entry['_lineage_root_id'] = root_id
|
||||
entry['_compression_segment_count'] = segment_count
|
||||
|
||||
return metadata
|
||||
|
||||
@@ -2115,6 +2115,8 @@ STREAMS_LOCK = threading.Lock()
|
||||
CANCEL_FLAGS: dict = {}
|
||||
AGENT_INSTANCES: dict = {} # stream_id -> AIAgent instance for interrupt propagation
|
||||
STREAM_PARTIAL_TEXT: dict = {} # stream_id -> partial assistant text accumulated during streaming
|
||||
STREAM_REASONING_TEXT: dict = {} # stream_id -> reasoning trace accumulated during streaming (#1361 §A)
|
||||
STREAM_LIVE_TOOL_CALLS: dict = {} # stream_id -> live tool calls accumulated during streaming (#1361 §B)
|
||||
SERVER_START_TIME = time.time()
|
||||
|
||||
# Agent cache: reuse AIAgent across messages in the same WebUI session so that
|
||||
|
||||
+32
-1
@@ -15,7 +15,7 @@ from api.config import (
|
||||
get_effective_default_model, _get_session_agent_lock,
|
||||
)
|
||||
from api.workspace import get_last_workspace
|
||||
from api.agent_sessions import read_importable_agent_session_rows
|
||||
from api.agent_sessions import read_importable_agent_session_rows, read_session_lineage_metadata
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -746,6 +746,31 @@ def _hide_from_default_sidebar(session: dict) -> bool:
|
||||
return source == 'cron' or sid.startswith('cron_')
|
||||
|
||||
|
||||
def _active_state_db_path() -> Path:
|
||||
"""Return state.db for the active Hermes profile, degrading to HERMES_HOME."""
|
||||
try:
|
||||
from api.profiles import get_active_hermes_home
|
||||
hermes_home = Path(get_active_hermes_home()).expanduser().resolve()
|
||||
except Exception:
|
||||
hermes_home = Path(os.getenv('HERMES_HOME', str(HOME / '.hermes'))).expanduser().resolve()
|
||||
return hermes_home / 'state.db'
|
||||
|
||||
|
||||
def _enrich_sidebar_lineage_metadata(sessions: list[dict]) -> None:
|
||||
"""Attach state.db compression lineage metadata used by sidebar collapse."""
|
||||
try:
|
||||
metadata = read_session_lineage_metadata(
|
||||
_active_state_db_path(),
|
||||
{s.get('session_id') for s in sessions},
|
||||
)
|
||||
except Exception:
|
||||
return
|
||||
for session in sessions:
|
||||
sid = session.get('session_id')
|
||||
if sid in metadata:
|
||||
session.update(metadata[sid])
|
||||
|
||||
|
||||
def all_sessions():
|
||||
active_stream_ids = _active_stream_ids()
|
||||
# Phase C: try index first for O(1) read; fall back to full scan
|
||||
@@ -804,6 +829,7 @@ def all_sessions():
|
||||
for s in result:
|
||||
if not s.get('profile'):
|
||||
s['profile'] = 'default'
|
||||
_enrich_sidebar_lineage_metadata(result)
|
||||
return result
|
||||
except Exception:
|
||||
logger.debug("Failed to load session index, falling back to full scan")
|
||||
@@ -832,6 +858,7 @@ def all_sessions():
|
||||
for s in result:
|
||||
if not s.get('profile'):
|
||||
s['profile'] = 'default'
|
||||
_enrich_sidebar_lineage_metadata(result)
|
||||
return result
|
||||
|
||||
|
||||
@@ -1015,6 +1042,10 @@ def get_cli_sessions() -> list:
|
||||
'raw_source': row.get('raw_source'),
|
||||
'session_source': row.get('session_source'),
|
||||
'source_label': row.get('source_label'),
|
||||
'parent_session_id': row.get('parent_session_id'),
|
||||
'_lineage_root_id': row.get('_lineage_root_id'),
|
||||
'_lineage_tip_id': row.get('_lineage_tip_id'),
|
||||
'_compression_segment_count': row.get('_compression_segment_count'),
|
||||
'is_cli_session': True,
|
||||
})
|
||||
except Exception as _cli_err:
|
||||
|
||||
+2
-4
@@ -150,12 +150,10 @@ def get_hermes_home_for_profile(name: str) -> Path:
|
||||
empty, 'default', or does not match the profile-name format (rejects path
|
||||
traversal such as '../../etc').
|
||||
"""
|
||||
if not name or name == 'default' or not _PROFILE_ID_RE.match(name):
|
||||
if not name or name == 'default' or not _PROFILE_ID_RE.fullmatch(name):
|
||||
return _DEFAULT_HERMES_HOME
|
||||
profile_dir = _DEFAULT_HERMES_HOME / 'profiles' / name
|
||||
if profile_dir.is_dir():
|
||||
return profile_dir
|
||||
return _DEFAULT_HERMES_HOME
|
||||
return profile_dir
|
||||
|
||||
|
||||
_TERMINAL_ENV_MAPPINGS = {
|
||||
|
||||
+20
-2
@@ -95,10 +95,28 @@ def _cron_output_content_window(text: str, limit: int = _CRON_OUTPUT_CONTENT_LIM
|
||||
def _run_cron_tracked(job):
|
||||
"""Wrapper that tracks running state around cron.scheduler.run_job."""
|
||||
from cron.scheduler import run_job # import here — runs inside a worker thread
|
||||
from cron.jobs import mark_job_run, save_job_output
|
||||
|
||||
job_id = job.get("id", "")
|
||||
try:
|
||||
run_job(job)
|
||||
success, output, final_response, error = run_job(job)
|
||||
save_job_output(job_id, output)
|
||||
|
||||
# Match the scheduled cron path: an apparently successful run with no
|
||||
# final response should not leave the job looking healthy.
|
||||
if success and not final_response:
|
||||
success = False
|
||||
error = "Agent completed but produced empty response (model error, timeout, or misconfiguration)"
|
||||
|
||||
mark_job_run(job_id, success, error)
|
||||
except Exception as e:
|
||||
logger.exception("Manual cron run failed for job %s", job_id)
|
||||
try:
|
||||
mark_job_run(job_id, False, str(e))
|
||||
except Exception:
|
||||
logger.debug("Failed to mark manual cron run failure for %s", job_id)
|
||||
finally:
|
||||
_mark_cron_done(job.get("id", ""))
|
||||
_mark_cron_done(job_id)
|
||||
|
||||
_PROVIDER_ALIASES = {
|
||||
"claude": "anthropic",
|
||||
|
||||
+79
-11
@@ -21,6 +21,7 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
from api.config import (
|
||||
STREAMS, STREAMS_LOCK, CANCEL_FLAGS, AGENT_INSTANCES, STREAM_PARTIAL_TEXT,
|
||||
STREAM_REASONING_TEXT, STREAM_LIVE_TOOL_CALLS,
|
||||
LOCK, SESSIONS, SESSION_DIR,
|
||||
_get_session_agent_lock, _set_thread_env, _clear_thread_env,
|
||||
SESSION_AGENT_LOCKS, SESSION_AGENT_LOCKS_LOCK,
|
||||
@@ -1373,6 +1374,8 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
|
||||
with STREAMS_LOCK:
|
||||
CANCEL_FLAGS[stream_id] = cancel_event
|
||||
STREAM_PARTIAL_TEXT[stream_id] = '' # start accumulating partial text (#893)
|
||||
STREAM_REASONING_TEXT[stream_id] = '' # start accumulating reasoning trace (#1361 §A)
|
||||
STREAM_LIVE_TOOL_CALLS[stream_id] = [] # start accumulating tool calls (#1361 §B)
|
||||
|
||||
# Register this stream with the global streaming meter
|
||||
meter().begin_session(stream_id)
|
||||
@@ -1575,6 +1578,9 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
|
||||
if text is None:
|
||||
return
|
||||
_reasoning_text += str(text)
|
||||
# Mirror to shared dict so cancel_stream() can persist it (#1361 §A)
|
||||
if stream_id in STREAM_REASONING_TEXT:
|
||||
STREAM_REASONING_TEXT[stream_id] += str(text)
|
||||
put('reasoning', {'text': str(text)})
|
||||
# Track reasoning tokens in the meter so TPS reflects all AI output
|
||||
meter().record_reasoning(stream_id, len(_reasoning_text))
|
||||
@@ -1607,6 +1613,9 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
|
||||
reason_text = preview if event_type == 'reasoning.available' else name
|
||||
if reason_text:
|
||||
_reasoning_text += str(reason_text)
|
||||
# Mirror to shared dict so cancel_stream() can persist it (#1361 §A)
|
||||
if stream_id in STREAM_REASONING_TEXT:
|
||||
STREAM_REASONING_TEXT[stream_id] += str(reason_text)
|
||||
put('reasoning', {'text': str(reason_text)})
|
||||
meter().record_reasoning(stream_id, len(_reasoning_text))
|
||||
_emit_metering()
|
||||
@@ -1623,6 +1632,13 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
|
||||
'name': name,
|
||||
'args': args if isinstance(args, dict) else {},
|
||||
})
|
||||
# Mirror to shared dict so cancel_stream() can persist it (#1361 §B)
|
||||
if stream_id in STREAM_LIVE_TOOL_CALLS:
|
||||
STREAM_LIVE_TOOL_CALLS[stream_id].append({
|
||||
'name': name,
|
||||
'args': args if isinstance(args, dict) else {},
|
||||
'done': False,
|
||||
})
|
||||
put('tool', {
|
||||
'event_type': event_type or 'tool.started',
|
||||
'name': name,
|
||||
@@ -1651,6 +1667,16 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
|
||||
live_tc['duration'] = cb_kwargs.get('duration')
|
||||
live_tc['is_error'] = bool(cb_kwargs.get('is_error', False))
|
||||
break
|
||||
# Mirror done state to shared dict (#1361 §B)
|
||||
if stream_id in STREAM_LIVE_TOOL_CALLS:
|
||||
for shared_tc in reversed(STREAM_LIVE_TOOL_CALLS[stream_id]):
|
||||
if shared_tc.get('done'):
|
||||
continue
|
||||
if not name or shared_tc.get('name') == name:
|
||||
shared_tc['done'] = True
|
||||
shared_tc['duration'] = cb_kwargs.get('duration')
|
||||
shared_tc['is_error'] = bool(cb_kwargs.get('is_error', False))
|
||||
break
|
||||
# Signal the checkpoint thread that new work has completed (Issue #765).
|
||||
# Each completed tool call is a meaningful unit of progress worth persisting.
|
||||
_checkpoint_activity[0] += 1
|
||||
@@ -2443,6 +2469,8 @@ def _run_agent_streaming(session_id, msg_text, model, workspace, stream_id, atta
|
||||
CANCEL_FLAGS.pop(stream_id, None)
|
||||
AGENT_INSTANCES.pop(stream_id, None) # Clean up agent instance reference
|
||||
STREAM_PARTIAL_TEXT.pop(stream_id, None) # Clean up partial text buffer (#893)
|
||||
STREAM_REASONING_TEXT.pop(stream_id, None) # Clean up reasoning trace (#1361 §A)
|
||||
STREAM_LIVE_TOOL_CALLS.pop(stream_id, None) # Clean up tool calls (#1361 §B)
|
||||
|
||||
# ============================================================
|
||||
# SECTION: HTTP Request Handler
|
||||
@@ -2630,6 +2658,17 @@ def cancel_stream(stream_id: str) -> bool:
|
||||
live_partials = getattr(_live_config, 'STREAM_PARTIAL_TEXT', partial_texts)
|
||||
if live_partials is not partial_texts:
|
||||
_cancel_partial_text = live_partials.get(stream_id, '')
|
||||
# Capture reasoning trace and live tool calls (#1361 §A + §B)
|
||||
_cancel_reasoning = STREAM_REASONING_TEXT.get(stream_id, '')
|
||||
if not _cancel_reasoning:
|
||||
live_reasoning = getattr(_live_config, 'STREAM_REASONING_TEXT', STREAM_REASONING_TEXT)
|
||||
if live_reasoning is not STREAM_REASONING_TEXT:
|
||||
_cancel_reasoning = live_reasoning.get(stream_id, '')
|
||||
_cancel_tool_calls = STREAM_LIVE_TOOL_CALLS.get(stream_id, [])
|
||||
if not _cancel_tool_calls:
|
||||
live_tools = getattr(_live_config, 'STREAM_LIVE_TOOL_CALLS', STREAM_LIVE_TOOL_CALLS)
|
||||
if live_tools is not STREAM_LIVE_TOOL_CALLS:
|
||||
_cancel_tool_calls = live_tools.get(stream_id, [])
|
||||
|
||||
# Session cleanup outside STREAMS_LOCK to preserve lock ordering.
|
||||
# Acquire the per-session _agent_lock too, mirroring every other session
|
||||
@@ -2708,7 +2747,13 @@ def cancel_stream(stream_id: str) -> bool:
|
||||
# content IS kept in the history sent to the agent on the next user
|
||||
# message, letting the model continue from where it was cut off.
|
||||
# See the inner comment on the append call below for the rationale.
|
||||
#
|
||||
# #1361: Also persist reasoning trace and live tool calls that were
|
||||
# accumulated in thread-local variables but invisible to the cancel path.
|
||||
# This prevents paid-token data loss when cancelling mid-reasoning or
|
||||
# mid-tool-execution.
|
||||
partial_text = _cancel_partial_text.strip() if _cancel_partial_text else ''
|
||||
_stripped = ''
|
||||
if partial_text:
|
||||
import re as _re
|
||||
# Strip thinking/reasoning markup from partial content before saving.
|
||||
@@ -2721,17 +2766,40 @@ def cancel_stream(stream_id: str) -> bool:
|
||||
_stripped = _re.sub(r'<think(?:ing)?\b[^>]*>.*',
|
||||
'', _stripped,
|
||||
flags=_re.DOTALL | _re.IGNORECASE).strip()
|
||||
if _stripped:
|
||||
# Mark _partial=True for session/UI identification only.
|
||||
# Deliberately NOT _error=True — the partial content is real model
|
||||
# output and should be visible in conversation history so the model
|
||||
# can continue from it on the next turn (#893).
|
||||
_cs.messages.append({
|
||||
'role': 'assistant',
|
||||
'content': _stripped,
|
||||
'_partial': True,
|
||||
'timestamp': int(time.time()),
|
||||
})
|
||||
# Determine whether there is anything to preserve beyond just the
|
||||
# cancel marker. Content text, reasoning trace, or tool calls all
|
||||
# count (#1361 §C — previously only _stripped was checked, so a
|
||||
# reasoning-only or tool-only stream produced NO partial message).
|
||||
_has_reasoning = bool(_cancel_reasoning and _cancel_reasoning.strip())
|
||||
_has_tools = bool(_cancel_tool_calls)
|
||||
if _stripped or _has_reasoning or _has_tools:
|
||||
_partial_msg: dict = {
|
||||
'role': 'assistant',
|
||||
'content': _stripped, # may be empty for reasoning/tool-only turns
|
||||
'_partial': True,
|
||||
'timestamp': int(time.time()),
|
||||
}
|
||||
if _has_reasoning:
|
||||
_partial_msg['reasoning'] = _cancel_reasoning.strip()
|
||||
if _has_tools:
|
||||
# NOTE: store under the private '_partial_tool_calls' key
|
||||
# (NOT 'tool_calls'). The captured entries use the WebUI
|
||||
# internal shape {name, args, done, duration, is_error}
|
||||
# — they do NOT carry the OpenAI/Anthropic API id +
|
||||
# function: {name, arguments} envelope. If we put them
|
||||
# under 'tool_calls', `_sanitize_messages_for_api`
|
||||
# (which whitelists 'tool_calls' via _API_SAFE_MSG_KEYS)
|
||||
# would forward them to the next-turn LLM call and
|
||||
# strict providers (OpenAI, Anthropic, Z.AI/GLM) would
|
||||
# 400 on the malformed entries — turning a "data lost
|
||||
# on cancel" bug into a "next message returns 400"
|
||||
# bug, which is worse. The underscore-prefixed key is
|
||||
# not in the whitelist, so sanitize strips it. The UI
|
||||
# reads it via static/messages.js and renders it
|
||||
# alongside the regular tool_calls path.
|
||||
# (Opus pre-release review pass 2 of v0.50.251.)
|
||||
_partial_msg['_partial_tool_calls'] = list(_cancel_tool_calls)
|
||||
_cs.messages.append(_partial_msg)
|
||||
# Cancel marker — flagged _error=True so it is stripped from conversation
|
||||
# history on the next turn (prevents model from seeing "Task cancelled."
|
||||
# as a prior assistant reply).
|
||||
|
||||
@@ -0,0 +1,23 @@
|
||||
# Upstream Issues — Root Cause Analysis
|
||||
|
||||
## #1256: Browser tools fail with "Playwright not installed"
|
||||
|
||||
### Root Cause
|
||||
The check lives in **hermes-agent** (upstream), not hermes-webui:
|
||||
|
||||
```
|
||||
hermes-agent/tools/browser_tool.py → check_browser_requirements()
|
||||
```
|
||||
|
||||
`check_browser_requirements()` does not recognize CDP (Chrome DevTools Protocol) mode — it only looks for a local Playwright/Puppeteer install. When the agent runs in CDP mode (connecting to an existing browser), the check still fails.
|
||||
|
||||
### WebUI side
|
||||
The WebUI already passes `CLI_TOOLSETS` correctly per-request. The `enabled_toolsets` field in the cron/chat config is dynamic and works as intended.
|
||||
|
||||
### Fix required
|
||||
The fix must happen in `hermes-agent/tools/browser_tool.py`:
|
||||
- `check_browser_requirements()` should skip the Playwright check when CDP mode is configured
|
||||
- Or add a `BROWSER_MODE=cdp` env var that bypasses the local browser requirement
|
||||
|
||||
### Workaround
|
||||
Use `CLOUD_BROWSER=true` or configure `browser.base_url` to point to a remote CDP endpoint. This bypasses the local Playwright requirement.
|
||||
+6
-1
@@ -1072,9 +1072,14 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
|
||||
S.session=session;S.messages=(session.messages||[]).filter(m=>m&&m.role);
|
||||
const hasMessageToolMetadata=S.messages.some(m=>{
|
||||
if(!m||m.role!=='assistant') return false;
|
||||
// Recognize both the standard `tool_calls` (used by completed assistant
|
||||
// turns where the LLM emitted tool_call entries) and the WebUI-internal
|
||||
// `_partial_tool_calls` (used on Stop/Cancel partial messages — see
|
||||
// api/streaming.py cancel_stream).
|
||||
const hasTc=Array.isArray(m.tool_calls)&&m.tool_calls.length>0;
|
||||
const hasPartialTc=Array.isArray(m._partial_tool_calls)&&m._partial_tool_calls.length>0;
|
||||
const hasTu=Array.isArray(m.content)&&m.content.some(p=>p&&p.type==='tool_use');
|
||||
return hasTc||hasTu;
|
||||
return hasTc||hasPartialTc||hasTu;
|
||||
});
|
||||
if(!hasMessageToolMetadata&&session.tool_calls&&session.tool_calls.length){
|
||||
S.toolCalls=(session.tool_calls||[]).map(tc=>({...tc,done:true}));
|
||||
|
||||
@@ -0,0 +1,69 @@
|
||||
"""Regression tests for manual WebUI cron runs."""
|
||||
|
||||
import sys
|
||||
import types
|
||||
|
||||
|
||||
def test_manual_cron_run_saves_output_and_marks_job(monkeypatch):
|
||||
import api.routes as routes
|
||||
|
||||
calls = []
|
||||
|
||||
cron_pkg = types.ModuleType("cron")
|
||||
cron_pkg.__path__ = []
|
||||
|
||||
cron_jobs = types.ModuleType("cron.jobs")
|
||||
cron_jobs.save_job_output = lambda job_id, output: calls.append(
|
||||
("save", job_id, output)
|
||||
)
|
||||
cron_jobs.mark_job_run = lambda job_id, success, error=None: calls.append(
|
||||
("mark", job_id, success, error)
|
||||
)
|
||||
|
||||
cron_scheduler = types.ModuleType("cron.scheduler")
|
||||
cron_scheduler.run_job = lambda job: (True, "manual output", "done", None)
|
||||
|
||||
monkeypatch.setitem(sys.modules, "cron", cron_pkg)
|
||||
monkeypatch.setitem(sys.modules, "cron.jobs", cron_jobs)
|
||||
monkeypatch.setitem(sys.modules, "cron.scheduler", cron_scheduler)
|
||||
|
||||
routes._mark_cron_running("job123")
|
||||
routes._run_cron_tracked({"id": "job123"})
|
||||
|
||||
assert calls == [
|
||||
("save", "job123", "manual output"),
|
||||
("mark", "job123", True, None),
|
||||
]
|
||||
assert routes._is_cron_running("job123") == (False, 0.0)
|
||||
|
||||
|
||||
def test_manual_cron_run_marks_empty_response_as_failure(monkeypatch):
|
||||
import api.routes as routes
|
||||
|
||||
calls = []
|
||||
|
||||
cron_pkg = types.ModuleType("cron")
|
||||
cron_pkg.__path__ = []
|
||||
|
||||
cron_jobs = types.ModuleType("cron.jobs")
|
||||
cron_jobs.save_job_output = lambda job_id, output: calls.append(
|
||||
("save", job_id, output)
|
||||
)
|
||||
cron_jobs.mark_job_run = lambda job_id, success, error=None: calls.append(
|
||||
("mark", job_id, success, error)
|
||||
)
|
||||
|
||||
cron_scheduler = types.ModuleType("cron.scheduler")
|
||||
cron_scheduler.run_job = lambda job: (True, "manual output", "", None)
|
||||
|
||||
monkeypatch.setitem(sys.modules, "cron", cron_pkg)
|
||||
monkeypatch.setitem(sys.modules, "cron.jobs", cron_jobs)
|
||||
monkeypatch.setitem(sys.modules, "cron.scheduler", cron_scheduler)
|
||||
|
||||
routes._mark_cron_running("job-empty")
|
||||
routes._run_cron_tracked({"id": "job-empty"})
|
||||
|
||||
assert calls[0] == ("save", "job-empty", "manual output")
|
||||
assert calls[1][0:3] == ("mark", "job-empty", False)
|
||||
assert "empty response" in calls[1][3]
|
||||
assert routes._is_cron_running("job-empty") == (False, 0.0)
|
||||
@@ -333,6 +333,9 @@ def test_compression_chain_collapses_to_latest_tip_in_sidebar():
|
||||
# bubbles to the top by true recency, not by the root's stale activity.
|
||||
# tip messages are at t0+201 and t0+202, so last_activity = t0 + 202.
|
||||
assert abs(tip.get('updated_at') - (t0 + 202)) < 0.01
|
||||
assert tip.get('_lineage_root_id') == 'chain_root_001'
|
||||
assert tip.get('_lineage_tip_id') == 'chain_tip_001'
|
||||
assert tip.get('_compression_segment_count') == 3
|
||||
|
||||
from api.agent_sessions import read_importable_agent_session_rows
|
||||
|
||||
|
||||
@@ -0,0 +1,81 @@
|
||||
"""Tests for issue #1195: sessions must route to the correct profile directory
|
||||
even when that profile directory does not exist yet on disk."""
|
||||
|
||||
import os
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
# ── helpers ──────────────────────────────────────────────────────────────────
|
||||
|
||||
def _make_hermes_home(base: Path, profile_name: str | None = None) -> Path:
|
||||
"""Create a temp HERMES_HOME (with optional profile dir) and return it."""
|
||||
hermes_home = base / ".hermes"
|
||||
hermes_home.mkdir(parents=True, exist_ok=True)
|
||||
if profile_name:
|
||||
(hermes_home / "profiles" / profile_name).mkdir(parents=True, exist_ok=True)
|
||||
return hermes_home
|
||||
|
||||
|
||||
# ── tests ────────────────────────────────────────────────────────────────────
|
||||
|
||||
class TestGetHermesHomeForProfile:
|
||||
"""get_hermes_home_for_profile() must return the profile path regardless of
|
||||
whether the directory already exists on disk (#1195)."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _patch_default_home(self, tmp_path):
|
||||
"""Patch _DEFAULT_HERMES_HOME to a temp directory for isolation."""
|
||||
from api.profiles import _DEFAULT_HERMES_HOME as real_default
|
||||
|
||||
fake_home = tmp_path / ".hermes"
|
||||
fake_home.mkdir(parents=True)
|
||||
with patch("api.profiles._DEFAULT_HERMES_HOME", fake_home):
|
||||
yield fake_home, real_default
|
||||
|
||||
def test_existing_profile_returns_profile_dir(self, _patch_default_home):
|
||||
fake_home, _ = _patch_default_home
|
||||
from api.profiles import get_hermes_home_for_profile
|
||||
|
||||
# Create an existing profile directory
|
||||
profile_dir = fake_home / "profiles" / "ayan"
|
||||
profile_dir.mkdir(parents=True)
|
||||
|
||||
result = get_hermes_home_for_profile("ayan")
|
||||
assert result == profile_dir
|
||||
|
||||
def test_nonexistent_profile_still_returns_profile_path(self, _patch_default_home):
|
||||
"""Core bug fix: profile dir doesn't exist yet but should still route there."""
|
||||
fake_home, _ = _patch_default_home
|
||||
from api.profiles import get_hermes_home_for_profile
|
||||
|
||||
# Do NOT create the profile directory
|
||||
expected = fake_home / "profiles" / "newprofile"
|
||||
assert not expected.exists() # confirm it doesn't exist
|
||||
|
||||
result = get_hermes_home_for_profile("newprofile")
|
||||
assert result == expected, "Should route to profile path even when dir missing"
|
||||
|
||||
def test_none_returns_default(self, _patch_default_home):
|
||||
fake_home, _ = _patch_default_home
|
||||
from api.profiles import get_hermes_home_for_profile
|
||||
|
||||
result = get_hermes_home_for_profile(None)
|
||||
assert result == fake_home
|
||||
|
||||
def test_empty_string_returns_default(self, _patch_default_home):
|
||||
fake_home, _ = _patch_default_home
|
||||
from api.profiles import get_hermes_home_for_profile
|
||||
|
||||
result = get_hermes_home_for_profile("")
|
||||
assert result == fake_home
|
||||
|
||||
def test_default_string_returns_default(self, _patch_default_home):
|
||||
fake_home, _ = _patch_default_home
|
||||
from api.profiles import get_hermes_home_for_profile
|
||||
|
||||
result = get_hermes_home_for_profile("default")
|
||||
assert result == fake_home
|
||||
@@ -0,0 +1,317 @@
|
||||
"""Regression tests for #1361 — Stop/Cancel discards already-streamed content.
|
||||
|
||||
Three distinct data-loss paths on cancel:
|
||||
|
||||
§A Reasoning text accumulated in a thread-local `_reasoning_text` is never
|
||||
visible to cancel_stream(), so it's lost on cancel.
|
||||
§B Live tool calls accumulated in thread-local `_live_tool_calls` are lost
|
||||
on cancel — only STREAM_PARTIAL_TEXT is captured.
|
||||
§C When the entire streamed output is reasoning (no visible tokens),
|
||||
_stripped is empty after regex cleanup, so NO partial assistant message
|
||||
is appended — only the *Task cancelled.* marker survives.
|
||||
|
||||
All three fix the same "tokens-paid-for-data-loss" class of bug.
|
||||
"""
|
||||
|
||||
import pathlib
|
||||
import queue
|
||||
import re
|
||||
import threading
|
||||
from unittest.mock import Mock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
import api.config as config
|
||||
import api.models as models
|
||||
import api.streaming as streaming
|
||||
from api.models import Session
|
||||
from api.streaming import cancel_stream
|
||||
|
||||
REPO_ROOT = pathlib.Path(__file__).parent.parent.resolve()
|
||||
|
||||
|
||||
# ── Fixtures ────────────────────────────────────────────────────────────────
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _isolate_session_dir(tmp_path, monkeypatch):
|
||||
"""Redirect SESSION_DIR / SESSION_INDEX_FILE to an isolated temp dir."""
|
||||
session_dir = tmp_path / "sessions"
|
||||
session_dir.mkdir()
|
||||
index_file = session_dir / "_index.json"
|
||||
monkeypatch.setattr(models, "SESSION_DIR", session_dir)
|
||||
monkeypatch.setattr(models, "SESSION_INDEX_FILE", index_file)
|
||||
models.SESSIONS.clear()
|
||||
yield
|
||||
models.SESSIONS.clear()
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _isolate_stream_state():
|
||||
"""Clear all shared streaming dicts before/after each test."""
|
||||
config.STREAMS.clear()
|
||||
config.CANCEL_FLAGS.clear()
|
||||
config.AGENT_INSTANCES.clear()
|
||||
config.STREAM_PARTIAL_TEXT.clear()
|
||||
# New shared dicts for §A and §B
|
||||
if hasattr(config, 'STREAM_REASONING_TEXT'):
|
||||
config.STREAM_REASONING_TEXT.clear()
|
||||
if hasattr(config, 'STREAM_LIVE_TOOL_CALLS'):
|
||||
config.STREAM_LIVE_TOOL_CALLS.clear()
|
||||
yield
|
||||
config.STREAMS.clear()
|
||||
config.CANCEL_FLAGS.clear()
|
||||
config.AGENT_INSTANCES.clear()
|
||||
config.STREAM_PARTIAL_TEXT.clear()
|
||||
if hasattr(config, 'STREAM_REASONING_TEXT'):
|
||||
config.STREAM_REASONING_TEXT.clear()
|
||||
if hasattr(config, 'STREAM_LIVE_TOOL_CALLS'):
|
||||
config.STREAM_LIVE_TOOL_CALLS.clear()
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _isolate_agent_locks():
|
||||
config.SESSION_AGENT_LOCKS.clear()
|
||||
yield
|
||||
config.SESSION_AGENT_LOCKS.clear()
|
||||
|
||||
|
||||
def _make_session(session_id="cancel_sid_1361",
|
||||
pending_msg="Help me debug this",
|
||||
messages=None):
|
||||
"""Build a session in mid-stream state."""
|
||||
s = Session(
|
||||
session_id=session_id,
|
||||
title="Test Session",
|
||||
messages=messages or [],
|
||||
)
|
||||
s.pending_user_message = pending_msg
|
||||
s.pending_attachments = []
|
||||
s.pending_started_at = None
|
||||
s.active_stream_id = "stream_1361"
|
||||
s.save()
|
||||
models.SESSIONS[session_id] = s
|
||||
return s
|
||||
|
||||
|
||||
def _setup_cancel_state(session_id, stream_id="stream_1361"):
|
||||
"""Wire up STREAMS/CANCEL_FLAGS/AGENT_INSTANCES for cancel_stream()."""
|
||||
config.STREAMS[stream_id] = queue.Queue()
|
||||
config.CANCEL_FLAGS[stream_id] = threading.Event()
|
||||
mock_agent = Mock()
|
||||
mock_agent.session_id = session_id
|
||||
mock_agent.interrupt = Mock()
|
||||
config.AGENT_INSTANCES[stream_id] = mock_agent
|
||||
return stream_id, mock_agent
|
||||
|
||||
|
||||
# ── §A: Reasoning text lost on cancel ───────────────────────────────────────
|
||||
|
||||
class TestCancelPreservesReasoningText:
|
||||
"""§A: _reasoning_text is thread-local and invisible to cancel_stream().
|
||||
|
||||
After fix: reasoning text should be persisted in a shared dict
|
||||
(STREAM_REASONING_TEXT) keyed by stream_id, and cancel_stream()
|
||||
should append it as a 'reasoning' field on the partial assistant message.
|
||||
"""
|
||||
|
||||
def test_cancel_with_reasoning_only_preserves_reasoning(self):
|
||||
"""Cancel during reasoning phase (no visible tokens) should persist reasoning."""
|
||||
sid = "test_1361_a1"
|
||||
stream_id = "stream_a1"
|
||||
s = _make_session(session_id=sid)
|
||||
_setup_cancel_state(sid, stream_id)
|
||||
|
||||
# Simulate: reasoning was accumulated but no visible tokens
|
||||
reasoning = "Let me think about this step by step..."
|
||||
config.STREAM_PARTIAL_TEXT[stream_id] = "" # no visible tokens
|
||||
|
||||
if hasattr(config, 'STREAM_REASONING_TEXT'):
|
||||
config.STREAM_REASONING_TEXT[stream_id] = reasoning
|
||||
|
||||
cancel_stream(stream_id)
|
||||
|
||||
# Reload and check
|
||||
s2 = models.SESSIONS[sid]
|
||||
msgs = s2.messages
|
||||
# There should be a partial assistant message with reasoning
|
||||
assistant_msgs = [m for m in msgs if isinstance(m, dict) and m.get('role') == 'assistant']
|
||||
has_reasoning = any(m.get('reasoning') for m in assistant_msgs)
|
||||
assert has_reasoning, \
|
||||
f"Expected reasoning field on partial assistant msg after cancel. Got messages: {assistant_msgs}"
|
||||
|
||||
def test_cancel_with_reasoning_and_partial_tokens_preserves_both(self):
|
||||
"""Cancel mid-stream with both reasoning and some visible tokens."""
|
||||
sid = "test_1361_a2"
|
||||
stream_id = "stream_a2"
|
||||
s = _make_session(session_id=sid)
|
||||
_setup_cancel_state(sid, stream_id)
|
||||
|
||||
reasoning = "Let me analyze the code..."
|
||||
partial_text = "Based on my analysis, the bug is in the"
|
||||
config.STREAM_PARTIAL_TEXT[stream_id] = partial_text
|
||||
|
||||
if hasattr(config, 'STREAM_REASONING_TEXT'):
|
||||
config.STREAM_REASONING_TEXT[stream_id] = reasoning
|
||||
|
||||
cancel_stream(stream_id)
|
||||
|
||||
s2 = models.SESSIONS[sid]
|
||||
assistant_msgs = [m for m in s2.messages if isinstance(m, dict) and m.get('role') == 'assistant']
|
||||
# Should have partial content
|
||||
partial_msgs = [m for m in assistant_msgs if m.get('_partial')]
|
||||
has_content = any(m.get('content') for m in partial_msgs)
|
||||
assert has_content, \
|
||||
f"Expected partial assistant content after cancel. Got: {partial_msgs}"
|
||||
|
||||
def test_cancel_without_reasoning_dict_works_as_before(self):
|
||||
"""If STREAM_REASONING_TEXT doesn't exist yet (pre-fix), cancel still works."""
|
||||
sid = "test_1361_a3"
|
||||
stream_id = "stream_a3"
|
||||
s = _make_session(session_id=sid)
|
||||
_setup_cancel_state(sid, stream_id)
|
||||
|
||||
config.STREAM_PARTIAL_TEXT[stream_id] = "Some partial text"
|
||||
|
||||
cancel_stream(stream_id)
|
||||
|
||||
s2 = models.SESSIONS[sid]
|
||||
msgs = s2.messages
|
||||
# Should have the cancel marker
|
||||
has_cancel = any(
|
||||
isinstance(m, dict) and m.get('role') == 'assistant' and m.get('_error')
|
||||
for m in msgs
|
||||
)
|
||||
assert has_cancel, "Cancel marker should always be present"
|
||||
|
||||
|
||||
# ── §B: Tool calls lost on cancel ───────────────────────────────────────────
|
||||
|
||||
class TestCancelPreservesToolCalls:
|
||||
"""§B: _live_tool_calls is thread-local and invisible to cancel_stream().
|
||||
|
||||
After fix: tool calls should be persisted in a shared dict
|
||||
(STREAM_LIVE_TOOL_CALLS) keyed by stream_id, and cancel_stream()
|
||||
should append them as tool_call entries on the partial assistant message.
|
||||
"""
|
||||
|
||||
def test_cancel_with_tool_calls_preserves_tools(self):
|
||||
"""Cancel after tool execution should preserve the tool call info."""
|
||||
sid = "test_1361_b1"
|
||||
stream_id = "stream_b1"
|
||||
s = _make_session(session_id=sid)
|
||||
_setup_cancel_state(sid, stream_id)
|
||||
|
||||
config.STREAM_PARTIAL_TEXT[stream_id] = ""
|
||||
|
||||
if hasattr(config, 'STREAM_LIVE_TOOL_CALLS'):
|
||||
config.STREAM_LIVE_TOOL_CALLS[stream_id] = [
|
||||
{"name": "read_file", "args": {"path": "/tmp/test.py"}, "done": True},
|
||||
{"name": "terminal", "args": {"command": "ls"}, "done": False},
|
||||
]
|
||||
|
||||
cancel_stream(stream_id)
|
||||
|
||||
s2 = models.SESSIONS[sid]
|
||||
assistant_msgs = [m for m in s2.messages if isinstance(m, dict) and m.get('role') == 'assistant']
|
||||
has_tools = any(m.get('_partial_tool_calls') or m.get('tool_calls') or m.get('tools') for m in assistant_msgs)
|
||||
assert has_tools, \
|
||||
f"Expected _partial_tool_calls on partial assistant msg after cancel. Got: {assistant_msgs}"
|
||||
|
||||
def test_cancel_with_tools_and_text_preserves_both(self):
|
||||
"""Cancel after tools + partial text should keep both."""
|
||||
sid = "test_1361_b2"
|
||||
stream_id = "stream_b2"
|
||||
s = _make_session(session_id=sid)
|
||||
_setup_cancel_state(sid, stream_id)
|
||||
|
||||
config.STREAM_PARTIAL_TEXT[stream_id] = "Here's what I found:"
|
||||
if hasattr(config, 'STREAM_LIVE_TOOL_CALLS'):
|
||||
config.STREAM_LIVE_TOOL_CALLS[stream_id] = [
|
||||
{"name": "web_search", "args": {"query": "test"}, "done": True},
|
||||
]
|
||||
|
||||
cancel_stream(stream_id)
|
||||
|
||||
s2 = models.SESSIONS[sid]
|
||||
assistant_msgs = [m for m in s2.messages if isinstance(m, dict) and m.get('role') == 'assistant']
|
||||
partial_msgs = [m for m in assistant_msgs if m.get('_partial')]
|
||||
has_content = any(m.get('content') for m in partial_msgs)
|
||||
assert has_content, \
|
||||
f"Expected partial content with tools after cancel. Got: {partial_msgs}"
|
||||
|
||||
|
||||
# ── §C: Empty _stripped skips entire append ─────────────────────────────────
|
||||
|
||||
class TestCancelWithReasoningOnlyNoText:
|
||||
"""§C: When streaming was 100% reasoning (no visible tokens), _stripped is
|
||||
empty after regex cleanup, so no partial assistant message is appended.
|
||||
|
||||
After fix: even when _stripped is empty, if reasoning or tool calls exist,
|
||||
a partial assistant message should be appended (with no content, but with
|
||||
reasoning and/or tool_calls fields).
|
||||
"""
|
||||
|
||||
def test_reasoning_only_creates_partial_message(self):
|
||||
"""Cancel after reasoning-only output should still create a partial msg."""
|
||||
sid = "test_1361_c1"
|
||||
stream_id = "stream_c1"
|
||||
s = _make_session(session_id=sid)
|
||||
_setup_cancel_state(sid, stream_id)
|
||||
|
||||
# Only reasoning, no visible tokens at all
|
||||
config.STREAM_PARTIAL_TEXT[stream_id] = ""
|
||||
|
||||
if hasattr(config, 'STREAM_REASONING_TEXT'):
|
||||
config.STREAM_REASONING_TEXT[stream_id] = "Deep reasoning here..."
|
||||
|
||||
cancel_stream(stream_id)
|
||||
|
||||
s2 = models.SESSIONS[sid]
|
||||
assistant_msgs = [m for m in s2.messages if isinstance(m, dict) and m.get('role') == 'assistant']
|
||||
# Should NOT be only the cancel marker — there should be a partial msg
|
||||
partial_msgs = [m for m in assistant_msgs if m.get('_partial')]
|
||||
assert len(partial_msgs) > 0, \
|
||||
f"Expected at least one partial assistant msg for reasoning-only cancel. Got: {assistant_msgs}"
|
||||
|
||||
def test_tools_only_creates_partial_message(self):
|
||||
"""Cancel after tool-only output (no text, no reasoning) should still create a partial msg."""
|
||||
sid = "test_1361_c2"
|
||||
stream_id = "stream_c2"
|
||||
s = _make_session(session_id=sid)
|
||||
_setup_cancel_state(sid, stream_id)
|
||||
|
||||
config.STREAM_PARTIAL_TEXT[stream_id] = ""
|
||||
|
||||
if hasattr(config, 'STREAM_LIVE_TOOL_CALLS'):
|
||||
config.STREAM_LIVE_TOOL_CALLS[stream_id] = [
|
||||
{"name": "read_file", "args": {"path": "/tmp/x"}, "done": True},
|
||||
]
|
||||
|
||||
cancel_stream(stream_id)
|
||||
|
||||
s2 = models.SESSIONS[sid]
|
||||
assistant_msgs = [m for m in s2.messages if isinstance(m, dict) and m.get('role') == 'assistant']
|
||||
partial_msgs = [m for m in assistant_msgs if m.get('_partial')]
|
||||
assert len(partial_msgs) > 0, \
|
||||
f"Expected at least one partial assistant msg for tools-only cancel. Got: {assistant_msgs}"
|
||||
|
||||
def test_no_reasoning_no_tools_no_partial(self):
|
||||
"""Cancel with no reasoning and no tools and no text = only cancel marker (no change)."""
|
||||
sid = "test_1361_c3"
|
||||
stream_id = "stream_c3"
|
||||
s = _make_session(session_id=sid)
|
||||
_setup_cancel_state(sid, stream_id)
|
||||
|
||||
config.STREAM_PARTIAL_TEXT[stream_id] = ""
|
||||
|
||||
cancel_stream(stream_id)
|
||||
|
||||
s2 = models.SESSIONS[sid]
|
||||
assistant_msgs = [m for m in s2.messages if isinstance(m, dict) and m.get('role') == 'assistant']
|
||||
# Should only have the cancel marker, no partial messages
|
||||
partial_msgs = [m for m in assistant_msgs if m.get('_partial')]
|
||||
cancel_msgs = [m for m in assistant_msgs if m.get('_error')]
|
||||
assert len(partial_msgs) == 0, \
|
||||
f"Expected no partial msg when nothing was streamed. Got partials: {partial_msgs}"
|
||||
assert len(cancel_msgs) == 1, \
|
||||
f"Expected exactly 1 cancel marker. Got: {cancel_msgs}"
|
||||
+31
-9
@@ -40,13 +40,18 @@ def test_get_hermes_home_for_profile_returns_profile_subdir(tmp_path, monkeypatc
|
||||
assert result == profile_dir
|
||||
|
||||
|
||||
def test_get_hermes_home_for_profile_falls_back_for_missing_profile(tmp_path, monkeypatch):
|
||||
"""R19c: Named profile that does not exist falls back to base home."""
|
||||
def test_get_hermes_home_for_profile_returns_profile_path_for_missing_profile(tmp_path, monkeypatch):
|
||||
"""R19c: Named profile that does not exist on disk now returns the
|
||||
profile-scoped path (created on first use by the agent layer), NOT the
|
||||
base home. Tightened in v0.50.251 / PR #1373 to fix #1195: the previous
|
||||
is_dir() fallback caused new profiles to silently route every session
|
||||
back to the default profile until the directory existed on disk.
|
||||
Path traversal is still blocked by the _PROFILE_ID_RE regex (R19j)."""
|
||||
import api.profiles as p
|
||||
|
||||
monkeypatch.setattr(p, '_DEFAULT_HERMES_HOME', tmp_path)
|
||||
result = p.get_hermes_home_for_profile('ghost')
|
||||
assert result == tmp_path
|
||||
assert result == tmp_path / 'profiles' / 'ghost'
|
||||
|
||||
|
||||
def test_get_hermes_home_for_profile_does_not_mutate_globals():
|
||||
@@ -171,8 +176,10 @@ def test_sessions_js_sends_profile_in_new_session_post():
|
||||
|
||||
def test_get_hermes_home_for_profile_rejects_path_traversal():
|
||||
"""R19j: get_hermes_home_for_profile() must reject names that don't match
|
||||
_PROFILE_ID_RE (e.g. path traversal like '../../etc') and return the base home.
|
||||
The regex guard is defence-in-depth on top of the is_dir() fallback."""
|
||||
_PROFILE_ID_RE (e.g. path traversal like '../../etc') and return the base
|
||||
home. After v0.50.251 / PR #1373 removed the is_dir() fallback, the regex
|
||||
is the SOLE guard against path traversal — verify each known-bad shape
|
||||
still returns the base home, not a traversed path."""
|
||||
import api.profiles as p
|
||||
base = p._DEFAULT_HERMES_HOME
|
||||
assert p.get_hermes_home_for_profile('../../etc') == base
|
||||
@@ -180,7 +187,22 @@ def test_get_hermes_home_for_profile_rejects_path_traversal():
|
||||
assert p.get_hermes_home_for_profile('/absolute/path') == base
|
||||
assert p.get_hermes_home_for_profile('has spaces') == base
|
||||
assert p.get_hermes_home_for_profile('UPPERCASE') == base
|
||||
# Valid names still work
|
||||
assert p.get_hermes_home_for_profile('alice') == base # nonexistent → fallback
|
||||
assert p.get_hermes_home_for_profile('my-profile') == base
|
||||
assert p.get_hermes_home_for_profile('profile_1') == base
|
||||
# Valid names now route to the profile-scoped path (created on first use).
|
||||
# Previously these returned `base` because no profile dir existed on disk.
|
||||
assert p.get_hermes_home_for_profile('alice') == base / 'profiles' / 'alice'
|
||||
assert p.get_hermes_home_for_profile('my-profile') == base / 'profiles' / 'my-profile'
|
||||
assert p.get_hermes_home_for_profile('profile_1') == base / 'profiles' / 'profile_1'
|
||||
# R19j coverage gaps closed in v0.50.251 per Opus pre-release review:
|
||||
# - Trailing-newline names must be rejected (re.match would let them through;
|
||||
# re.fullmatch correctly anchors $). Catches the match-vs-fullmatch footgun.
|
||||
assert p.get_hermes_home_for_profile('valid\n') == base
|
||||
assert p.get_hermes_home_for_profile('a\n') == base
|
||||
# - Length boundaries: 64 chars (max valid: 1 + 63 suffix) routes to profile path,
|
||||
# 65 chars rejected.
|
||||
assert p.get_hermes_home_for_profile('a' * 64) == base / 'profiles' / ('a' * 64)
|
||||
assert p.get_hermes_home_for_profile('a' * 65) == base
|
||||
# - Single-char name is the minimum valid form.
|
||||
assert p.get_hermes_home_for_profile('a') == base / 'profiles' / 'a'
|
||||
# - Non-ASCII / Unicode-trick names are rejected by the ASCII-only charset.
|
||||
assert p.get_hermes_home_for_profile('voilà') == base
|
||||
assert p.get_hermes_home_for_profile('名前') == base
|
||||
|
||||
@@ -0,0 +1,290 @@
|
||||
"""Regression tests for the v0.50.251 pre-release fixes to PR #1370.
|
||||
|
||||
PR #1370 originally did `SELECT id, parent_session_id, end_reason FROM
|
||||
sessions` (full table scan) on every call. At ~1000 rows the indexed
|
||||
lookup was 50x faster; the pre-release fix replaces the full scan with
|
||||
a parameterized `WHERE id IN (...)` query that hits PRIMARY KEY +
|
||||
idx_sessions_parent.
|
||||
|
||||
Also pins: orphan-parent references (parent_session_id points to a
|
||||
row that doesn't exist in state.db) must NOT be exposed in the API
|
||||
output, because #1358's frontend `_sessionLineageKey` falls through to
|
||||
`parent_session_id` when `_lineage_root_id` is missing — and would
|
||||
group orphans by a key no other session shares (cosmetic dead state).
|
||||
"""
|
||||
|
||||
import os
|
||||
import sqlite3
|
||||
import time
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
def _make_db(path):
|
||||
conn = sqlite3.connect(str(path))
|
||||
conn.executescript("""
|
||||
CREATE TABLE sessions (
|
||||
id TEXT PRIMARY KEY,
|
||||
source TEXT,
|
||||
title TEXT,
|
||||
model TEXT,
|
||||
started_at REAL NOT NULL,
|
||||
message_count INTEGER DEFAULT 0,
|
||||
parent_session_id TEXT,
|
||||
ended_at REAL,
|
||||
end_reason TEXT
|
||||
);
|
||||
CREATE INDEX idx_sessions_parent ON sessions(parent_session_id);
|
||||
""")
|
||||
return conn
|
||||
|
||||
|
||||
def _insert(conn, sid, *, parent=None, end_reason=None, started_at=None):
|
||||
conn.execute(
|
||||
"INSERT INTO sessions (id, source, title, model, started_at, parent_session_id, end_reason) "
|
||||
"VALUES (?, 'webui', ?, 'openai/gpt-5', ?, ?, ?)",
|
||||
(sid, sid, started_at or time.time(), parent, end_reason),
|
||||
)
|
||||
conn.commit()
|
||||
|
||||
|
||||
def test_does_not_full_scan_sessions_table(tmp_path, monkeypatch):
|
||||
"""The perf-critical invariant: must NOT do a `SELECT * FROM sessions`
|
||||
or `SELECT id, parent_session_id, end_reason FROM sessions` without a
|
||||
WHERE clause. Full scans regress sidebar refresh latency on power users
|
||||
with 1000+ session rows.
|
||||
"""
|
||||
from api import agent_sessions
|
||||
|
||||
db = tmp_path / "state.db"
|
||||
conn = _make_db(db)
|
||||
# Insert 500 unrelated rows + 5 we care about
|
||||
for i in range(500):
|
||||
_insert(conn, f"unrelated_{i:04d}")
|
||||
_insert(conn, "wanted_1")
|
||||
_insert(conn, "wanted_2", parent="wanted_1", end_reason=None)
|
||||
conn.close()
|
||||
|
||||
# Track every SQL the function issues
|
||||
queries = []
|
||||
real_connect = sqlite3.connect
|
||||
|
||||
class _TrackingConn:
|
||||
def __init__(self, *args, **kw):
|
||||
self._real = real_connect(*args, **kw)
|
||||
def cursor(self):
|
||||
return _TrackingCursor(self._real.cursor())
|
||||
def __enter__(self):
|
||||
return self
|
||||
def __exit__(self, *a):
|
||||
return self._real.__exit__(*a)
|
||||
@property
|
||||
def row_factory(self):
|
||||
return self._real.row_factory
|
||||
@row_factory.setter
|
||||
def row_factory(self, v):
|
||||
self._real.row_factory = v
|
||||
|
||||
class _TrackingCursor:
|
||||
def __init__(self, real):
|
||||
self._real = real
|
||||
def execute(self, sql, *args):
|
||||
queries.append(sql)
|
||||
return self._real.execute(sql, *args)
|
||||
def fetchall(self):
|
||||
return self._real.fetchall()
|
||||
def fetchone(self):
|
||||
return self._real.fetchone()
|
||||
|
||||
monkeypatch.setattr(sqlite3, "connect", _TrackingConn)
|
||||
agent_sessions.read_session_lineage_metadata(db, ["wanted_1", "wanted_2"])
|
||||
|
||||
# No query may select from sessions without a WHERE clause
|
||||
bad = [q for q in queries
|
||||
if "from sessions" in q.lower()
|
||||
and "where" not in q.lower()
|
||||
and "pragma" not in q.lower()]
|
||||
assert not bad, (
|
||||
f"read_session_lineage_metadata must scope its SELECTs to specific "
|
||||
f"session ids (PR #1370 originally did a full scan). Found unscoped "
|
||||
f"queries: {bad}"
|
||||
)
|
||||
|
||||
|
||||
def test_orphan_parent_reference_not_exposed_in_metadata(tmp_path):
|
||||
"""If a session row references a parent that doesn't exist in state.db
|
||||
(orphan), the API output must NOT include `parent_session_id` — because
|
||||
#1358's frontend lineage helper would treat it as a sidebar grouping key
|
||||
and cluster the orphan into a never-collapsing single-row group.
|
||||
"""
|
||||
from api.agent_sessions import read_session_lineage_metadata
|
||||
|
||||
db = tmp_path / "state.db"
|
||||
conn = _make_db(db)
|
||||
_insert(conn, "child", parent="missing_parent", end_reason=None)
|
||||
conn.close()
|
||||
|
||||
result = read_session_lineage_metadata(db, ["child"])
|
||||
assert "child" not in result or "parent_session_id" not in result["child"], (
|
||||
f"Orphan parent_session_id leaked into API output: {result}. "
|
||||
f"This causes the frontend lineage helper to group the orphan under "
|
||||
f"a key no other session shares — cosmetic dead state on the wire."
|
||||
)
|
||||
|
||||
|
||||
def test_cycle_in_parent_chain_terminates(tmp_path):
|
||||
"""Pathological data with a parent cycle (A→B→A) must not infinite-loop."""
|
||||
from api.agent_sessions import read_session_lineage_metadata
|
||||
|
||||
db = tmp_path / "state.db"
|
||||
conn = _make_db(db)
|
||||
_insert(conn, "A", parent="B", end_reason="compression")
|
||||
_insert(conn, "B", parent="A", end_reason="compression")
|
||||
conn.close()
|
||||
|
||||
import threading
|
||||
done = threading.Event()
|
||||
result_box = []
|
||||
|
||||
def worker():
|
||||
result_box.append(read_session_lineage_metadata(db, ["A"]))
|
||||
done.set()
|
||||
|
||||
t = threading.Thread(target=worker, daemon=True)
|
||||
t.start()
|
||||
finished = done.wait(timeout=2.0)
|
||||
assert finished, "read_session_lineage_metadata hung on a parent cycle"
|
||||
|
||||
result = result_box[0]
|
||||
# Cycle should terminate; segment count should be bounded (≤ 2 for A→B→A)
|
||||
seg = result.get("A", {}).get("_compression_segment_count", 0)
|
||||
assert 1 <= seg <= 5, f"Cycle should produce a small bounded count, got {seg}"
|
||||
|
||||
|
||||
def test_chain_walk_returns_correct_root_for_real_compression_chain(tmp_path):
|
||||
"""End-to-end behavioural test: 4-segment compression chain returns the
|
||||
correct root and a segment count of 4."""
|
||||
from api.agent_sessions import read_session_lineage_metadata
|
||||
|
||||
db = tmp_path / "state.db"
|
||||
conn = _make_db(db)
|
||||
_insert(conn, "root", end_reason="compression")
|
||||
_insert(conn, "seg2", parent="root", end_reason="compression")
|
||||
_insert(conn, "seg3", parent="seg2", end_reason="compression")
|
||||
_insert(conn, "tip", parent="seg3", end_reason=None)
|
||||
conn.close()
|
||||
|
||||
result = read_session_lineage_metadata(db, ["tip"])
|
||||
assert result["tip"]["_lineage_root_id"] == "root"
|
||||
assert result["tip"]["_compression_segment_count"] == 4
|
||||
assert result["tip"]["parent_session_id"] == "seg3"
|
||||
|
||||
|
||||
def test_in_clause_chunked_for_large_session_set(tmp_path, monkeypatch):
|
||||
"""The first hop must chunk the IN clause into batches of <= 500.
|
||||
Without chunking, a power user with 2000+ sessions in the sidebar would
|
||||
trigger SQLITE_MAX_VARIABLE_NUMBER on Python 3.9's sqlite 3.31 (default
|
||||
limit 999), the OperationalError gets swallowed by the except wrapper,
|
||||
and lineage collapse silently disables forever for that user.
|
||||
(Opus pre-release review of v0.50.251, SHOULD-FIX 2.)
|
||||
"""
|
||||
from api import agent_sessions
|
||||
|
||||
db = tmp_path / "state.db"
|
||||
conn = _make_db(db)
|
||||
# 1500 unrelated rows
|
||||
for i in range(1500):
|
||||
_insert(conn, f"session_{i:04d}")
|
||||
conn.close()
|
||||
|
||||
# Track each query's IN clause variable count
|
||||
in_counts = []
|
||||
real_connect = sqlite3.connect
|
||||
|
||||
class _TrackingConn:
|
||||
def __init__(self, *args, **kw):
|
||||
self._real = real_connect(*args, **kw)
|
||||
def cursor(self):
|
||||
return _TrackingCursor(self._real.cursor())
|
||||
def __enter__(self): return self
|
||||
def __exit__(self, *a): return self._real.__exit__(*a)
|
||||
@property
|
||||
def row_factory(self): return self._real.row_factory
|
||||
@row_factory.setter
|
||||
def row_factory(self, v): self._real.row_factory = v
|
||||
|
||||
class _TrackingCursor:
|
||||
def __init__(self, real): self._real = real
|
||||
def execute(self, sql, *args):
|
||||
if "IN (" in sql:
|
||||
in_counts.append(sql.count("?"))
|
||||
return self._real.execute(sql, *args)
|
||||
def fetchall(self): return self._real.fetchall()
|
||||
def fetchone(self): return self._real.fetchone()
|
||||
|
||||
monkeypatch.setattr(sqlite3, "connect", _TrackingConn)
|
||||
|
||||
# Request 1500 ids (the full set we inserted)
|
||||
wanted = [f"session_{i:04d}" for i in range(1500)]
|
||||
agent_sessions.read_session_lineage_metadata(db, wanted)
|
||||
|
||||
assert in_counts, "Expected at least one IN-clause query"
|
||||
over_limit = [n for n in in_counts if n > 999]
|
||||
assert not over_limit, (
|
||||
f"IN clause must be chunked to <= 999 vars to stay under SQLite default "
|
||||
f"limit (chunked to 500 in the implementation). Found queries with "
|
||||
f"{over_limit} variables — would raise OperationalError on older sqlite."
|
||||
)
|
||||
|
||||
|
||||
def test_two_children_sharing_non_continuation_parent_not_collapsed(tmp_path):
|
||||
"""The contract Opus flagged: when two distinct WebUI sessions share the
|
||||
same parent and that parent's end_reason is NOT compression/cli_close
|
||||
(e.g. 'user_stop'), neither child should expose parent_session_id.
|
||||
The frontend's #1358 lineage helper would otherwise group them under
|
||||
the parent's id key and incorrectly collapse the rows.
|
||||
"""
|
||||
from api.agent_sessions import read_session_lineage_metadata
|
||||
|
||||
db = tmp_path / "state.db"
|
||||
conn = _make_db(db)
|
||||
_insert(conn, "shared_parent", end_reason="user_stop")
|
||||
_insert(conn, "child_a", parent="shared_parent", end_reason=None)
|
||||
_insert(conn, "child_b", parent="shared_parent", end_reason=None)
|
||||
conn.close()
|
||||
|
||||
result = read_session_lineage_metadata(db, ["child_a", "child_b"])
|
||||
# Neither child should expose parent_session_id (would let frontend
|
||||
# group them under the same key and collapse one)
|
||||
for sid in ["child_a", "child_b"]:
|
||||
entry = result.get(sid, {})
|
||||
assert "parent_session_id" not in entry, (
|
||||
f"{sid} leaked parent_session_id despite parent being non-continuation"
|
||||
)
|
||||
|
||||
|
||||
def test_non_compression_parent_does_not_extend_lineage(tmp_path):
|
||||
"""If parent's end_reason is something OTHER than 'compression' or
|
||||
'cli_close' (e.g. 'user_stop', 'session_reset', 'cron_complete'),
|
||||
the chain stops at that boundary AND parent_session_id is not exposed
|
||||
(tightened in v0.50.251 per Opus SHOULD-FIX 1 — exposing the parent
|
||||
ref would let the frontend group two children sharing the same
|
||||
non-continuation parent into a single sidebar row).
|
||||
"""
|
||||
from api.agent_sessions import read_session_lineage_metadata
|
||||
|
||||
db = tmp_path / "state.db"
|
||||
conn = _make_db(db)
|
||||
_insert(conn, "parent", end_reason="user_stop") # NOT compression
|
||||
_insert(conn, "child", parent="parent", end_reason=None)
|
||||
conn.close()
|
||||
|
||||
result = read_session_lineage_metadata(db, ["child"])
|
||||
# parent_session_id must NOT be exposed (parent's end_reason is non-continuation)
|
||||
assert "child" not in result or "parent_session_id" not in result["child"], (
|
||||
"parent_session_id leaked through despite parent's end_reason being "
|
||||
"'user_stop' — would cause incorrect sidebar collapse."
|
||||
)
|
||||
# _lineage_root_id should NOT be set — chain doesn't span the boundary
|
||||
assert "child" not in result or "_lineage_root_id" not in result["child"]
|
||||
assert "child" not in result or "_compression_segment_count" not in result["child"]
|
||||
@@ -0,0 +1,105 @@
|
||||
"""Regression test for the v0.50.251 Opus pass-2 MUST-FIX on PR #1375.
|
||||
|
||||
Original PR #1375 stored cancelled tool calls under the message key
|
||||
`tool_calls`. That key is whitelisted by `_API_SAFE_MSG_KEYS` so the
|
||||
sanitize-for-API path forwarded them to the next-turn LLM call. But the
|
||||
captured entries use the WebUI internal shape ({name, args, done,
|
||||
duration, is_error}) — they don't have OpenAI/Anthropic's id +
|
||||
function: {name, arguments} envelope. Strict providers (OpenAI,
|
||||
Anthropic, Z.AI/GLM) would 400 on the malformed entries — turning a
|
||||
"data lost on cancel" bug into a "next message returns 400" bug.
|
||||
|
||||
The fix renames the key to `_partial_tool_calls` (underscore-prefixed
|
||||
private key NOT in the whitelist), so sanitize correctly strips it.
|
||||
The UI reads it via static/messages.js.
|
||||
|
||||
This test pins the invariant: a partial assistant message with
|
||||
`_partial_tool_calls` set must produce ZERO `tool_calls` after
|
||||
sanitize-for-API.
|
||||
"""
|
||||
import pathlib
|
||||
import sys
|
||||
|
||||
import pytest
|
||||
|
||||
REPO_ROOT = pathlib.Path(__file__).parent.parent.resolve()
|
||||
sys.path.insert(0, str(REPO_ROOT))
|
||||
|
||||
|
||||
def test_partial_tool_calls_field_not_forwarded_to_llm():
|
||||
"""The `_partial_tool_calls` field must not survive _sanitize_messages_for_api.
|
||||
Otherwise the malformed entries get sent to the LLM and cause 400 errors."""
|
||||
from api.streaming import _sanitize_messages_for_api
|
||||
|
||||
messages = [
|
||||
{"role": "user", "content": "do a search"},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "Looking up...",
|
||||
"_partial": True,
|
||||
"_partial_tool_calls": [
|
||||
{"name": "web_search", "args": {"query": "x"}, "done": False},
|
||||
],
|
||||
"reasoning": "Let me think about this",
|
||||
},
|
||||
]
|
||||
sanitized = _sanitize_messages_for_api(messages)
|
||||
# The partial assistant message must NOT have _partial_tool_calls (private key).
|
||||
# It must NOT have tool_calls (would bypass the rename and 400 the LLM).
|
||||
# It must NOT have reasoning (not in whitelist).
|
||||
assistant_msgs = [m for m in sanitized if m.get("role") == "assistant"]
|
||||
assert assistant_msgs, "Sanitized output must include the assistant message"
|
||||
for m in assistant_msgs:
|
||||
assert "_partial_tool_calls" not in m, (
|
||||
"Sanitize-for-API must strip _partial_tool_calls — it's a UI-only key. "
|
||||
f"Got: {m}"
|
||||
)
|
||||
assert "tool_calls" not in m, (
|
||||
"Sanitize-for-API must NOT have tool_calls on a partial message — the "
|
||||
"captured entries use WebUI shape and would 400 on strict providers. "
|
||||
f"Got: {m}"
|
||||
)
|
||||
assert "reasoning" not in m, (
|
||||
"Sanitize-for-API must strip reasoning (not in _API_SAFE_MSG_KEYS). "
|
||||
f"Got: {m}"
|
||||
)
|
||||
|
||||
|
||||
def test_legitimate_tool_calls_are_preserved_for_completed_turns():
|
||||
"""Completed assistant turns with REAL tool_calls (with id + function envelope)
|
||||
must still pass through sanitize unchanged. The rename only affects
|
||||
cancel-partial messages, not normal completed turns."""
|
||||
from api.streaming import _sanitize_messages_for_api
|
||||
|
||||
messages = [
|
||||
{"role": "user", "content": "search"},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "I'll search.",
|
||||
"tool_calls": [
|
||||
{
|
||||
"id": "call_abc",
|
||||
"type": "function",
|
||||
"function": {"name": "web_search", "arguments": '{"query":"x"}'}
|
||||
},
|
||||
],
|
||||
},
|
||||
{
|
||||
"role": "tool",
|
||||
"tool_call_id": "call_abc",
|
||||
"name": "web_search",
|
||||
"content": "result",
|
||||
},
|
||||
]
|
||||
sanitized = _sanitize_messages_for_api(messages)
|
||||
assistant_msgs = [m for m in sanitized if m.get("role") == "assistant"]
|
||||
assert assistant_msgs, "Sanitized output must include the assistant message"
|
||||
assert assistant_msgs[0].get("tool_calls"), (
|
||||
"Legitimate tool_calls on completed turns must survive sanitize. "
|
||||
f"Got: {assistant_msgs[0]}"
|
||||
)
|
||||
# Must still have the OpenAI envelope shape
|
||||
tc = assistant_msgs[0]["tool_calls"][0]
|
||||
assert "id" in tc and "function" in tc, (
|
||||
f"tool_calls envelope must be preserved. Got: {tc}"
|
||||
)
|
||||
@@ -0,0 +1,161 @@
|
||||
"""Regression tests for /api/sessions lineage metadata used by sidebar collapse."""
|
||||
|
||||
import sqlite3
|
||||
import time
|
||||
|
||||
import pytest
|
||||
|
||||
import api.models as models
|
||||
from api.models import SESSIONS, STREAMS, Session, all_sessions
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _isolate(tmp_path, monkeypatch):
|
||||
session_dir = tmp_path / "sessions"
|
||||
session_dir.mkdir()
|
||||
index_file = session_dir / "_index.json"
|
||||
state_db = tmp_path / "state.db"
|
||||
monkeypatch.setattr(models, "SESSION_DIR", session_dir)
|
||||
monkeypatch.setattr(models, "SESSION_INDEX_FILE", index_file)
|
||||
monkeypatch.setattr(models, "_active_state_db_path", lambda: state_db)
|
||||
SESSIONS.clear()
|
||||
STREAMS.clear()
|
||||
yield state_db
|
||||
SESSIONS.clear()
|
||||
STREAMS.clear()
|
||||
|
||||
|
||||
def _ensure_state_db(path):
|
||||
conn = sqlite3.connect(str(path))
|
||||
conn.executescript(
|
||||
"""
|
||||
CREATE TABLE sessions (
|
||||
id TEXT PRIMARY KEY,
|
||||
source TEXT,
|
||||
title TEXT,
|
||||
model TEXT,
|
||||
started_at REAL NOT NULL,
|
||||
message_count INTEGER DEFAULT 0,
|
||||
parent_session_id TEXT,
|
||||
ended_at REAL,
|
||||
end_reason TEXT
|
||||
);
|
||||
"""
|
||||
)
|
||||
return conn
|
||||
|
||||
|
||||
def _insert_state_row(conn, sid, *, parent=None, ended_at=None, end_reason=None, started_at=None):
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT INTO sessions
|
||||
(id, source, title, model, started_at, message_count, parent_session_id, ended_at, end_reason)
|
||||
VALUES (?, 'webui', ?, 'openai/gpt-5', ?, 2, ?, ?, ?)
|
||||
""",
|
||||
(sid, sid, started_at or time.time(), parent, ended_at, end_reason),
|
||||
)
|
||||
conn.commit()
|
||||
|
||||
|
||||
def _save_webui_session(sid, *, title, updated_at):
|
||||
session = Session(
|
||||
session_id=sid,
|
||||
title=title,
|
||||
messages=[{"role": "user", "content": "hello"}, {"role": "assistant", "content": "hi"}],
|
||||
updated_at=updated_at,
|
||||
)
|
||||
session.save(touch_updated_at=False)
|
||||
return session
|
||||
|
||||
|
||||
def test_all_sessions_exposes_state_db_lineage_metadata_for_webui_json_sessions(_isolate):
|
||||
"""PR #1358 can only collapse rows when /api/sessions exposes lineage keys."""
|
||||
conn = _ensure_state_db(_isolate)
|
||||
t0 = time.time() - 100
|
||||
try:
|
||||
_save_webui_session("lineage_api_root", title="Hermes WebUI", updated_at=t0)
|
||||
_save_webui_session("lineage_api_tip", title="Hermes WebUI #2", updated_at=t0 + 10)
|
||||
_insert_state_row(
|
||||
conn,
|
||||
"lineage_api_root",
|
||||
started_at=t0,
|
||||
ended_at=t0 + 5,
|
||||
end_reason="compression",
|
||||
)
|
||||
_insert_state_row(
|
||||
conn,
|
||||
"lineage_api_tip",
|
||||
parent="lineage_api_root",
|
||||
started_at=t0 + 6,
|
||||
)
|
||||
|
||||
rows = {row["session_id"]: row for row in all_sessions()}
|
||||
|
||||
assert rows["lineage_api_tip"].get("parent_session_id") == "lineage_api_root"
|
||||
assert rows["lineage_api_tip"].get("_lineage_root_id") == "lineage_api_root"
|
||||
assert rows["lineage_api_tip"].get("_compression_segment_count") == 2
|
||||
assert "_lineage_root_id" not in rows["lineage_api_root"]
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
def test_non_compression_state_db_parent_does_not_create_sidebar_lineage(_isolate):
|
||||
conn = _ensure_state_db(_isolate)
|
||||
t0 = time.time() - 100
|
||||
try:
|
||||
_save_webui_session("lineage_api_plain_parent", title="Parent", updated_at=t0)
|
||||
_save_webui_session("lineage_api_plain_child", title="Child", updated_at=t0 + 10)
|
||||
_insert_state_row(
|
||||
conn,
|
||||
"lineage_api_plain_parent",
|
||||
started_at=t0,
|
||||
ended_at=t0 + 5,
|
||||
end_reason="user_stop",
|
||||
)
|
||||
_insert_state_row(
|
||||
conn,
|
||||
"lineage_api_plain_child",
|
||||
parent="lineage_api_plain_parent",
|
||||
started_at=t0 + 6,
|
||||
)
|
||||
|
||||
rows = {row["session_id"]: row for row in all_sessions()}
|
||||
|
||||
# parent_session_id must NOT be exposed when the parent's end_reason
|
||||
# is not a continuation (compression / cli_close). The frontend's
|
||||
# _sessionLineageKey would otherwise group two children sharing a
|
||||
# `user_stop` parent under the same key — incorrect collapse.
|
||||
# (Tightened in v0.50.251 per Opus SHOULD-FIX 1.)
|
||||
assert "parent_session_id" not in rows["lineage_api_plain_child"]
|
||||
assert "_lineage_root_id" not in rows["lineage_api_plain_child"]
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
|
||||
def test_cli_close_parent_preserves_cross_surface_continuation_lineage(_isolate):
|
||||
conn = _ensure_state_db(_isolate)
|
||||
t0 = time.time() - 100
|
||||
try:
|
||||
_save_webui_session("lineage_api_cli_parent", title="Hermes WebUI #8", updated_at=t0)
|
||||
_save_webui_session("lineage_api_webui_child", title="Hermes WebUI #8", updated_at=t0 + 10)
|
||||
_insert_state_row(
|
||||
conn,
|
||||
"lineage_api_cli_parent",
|
||||
started_at=t0,
|
||||
ended_at=t0 + 5,
|
||||
end_reason="cli_close",
|
||||
)
|
||||
_insert_state_row(
|
||||
conn,
|
||||
"lineage_api_webui_child",
|
||||
parent="lineage_api_cli_parent",
|
||||
started_at=t0 + 6,
|
||||
)
|
||||
|
||||
rows = {row["session_id"]: row for row in all_sessions()}
|
||||
|
||||
assert rows["lineage_api_webui_child"].get("parent_session_id") == "lineage_api_cli_parent"
|
||||
assert rows["lineage_api_webui_child"].get("_lineage_root_id") == "lineage_api_cli_parent"
|
||||
finally:
|
||||
conn.close()
|
||||
Reference in New Issue
Block a user