Commit Graph

87 Commits

Author SHA1 Message Date
fxd-jason d2d464aac3 feat(clarify): SSE long-connection for real-time clarify notifications (#1355)
Replaces the 1.5s HTTP polling loop for clarify with a Server-Sent Events endpoint at /api/clarify/stream that pushes clarify events to the browser instantly. Mirrors the approval SSE pattern from v0.50.248 (#1350) including all the correctness lessons:

- Atomic subscribe + initial snapshot under clarify._lock
- _clarify_sse_notify called inside _lock for ordering guarantees (no notify-out-of-order race)
- Notify passes head=q[0].data (head-fidelity, not the just-appended entry)
- resolve_clarify also calls notify after pop so trailing clarifies surface immediately (no stuck-clarify bug)
- Empty-state notify with None,0 after pop-empty so frontend hides the card
- 30s keepalive comments, _CLIENT_DISCONNECT_ERRORS handling
- Bounded queue (maxsize=16) with silent drop on full
- Frontend: EventSource with automatic 3s HTTP polling fallback on onerror

Co-authored-by: fxd-jason <wujiachen7@gmail.com>
2026-04-30 21:32:51 +00:00
nesquena-hermes bbdacdca5c fix: context window indicator overflow (#1356)
- api/streaming.py SSE payload now falls back to agent.model_metadata.get_model_context_length when compressor doesn't supply context_length (mirrors the session-save fallback shipped in v0.50.247).
- api/streaming.py also falls back to s.last_prompt_tokens to avoid using the cumulative input_tokens counter.
- static/ui.js tracks rawPct separately from pct and shows '(context exceeded)' tooltip when rawPct > 100 instead of misleading '100% used (0% left)'.
- static/messages.js clears 'Uploading...' composer status after upload completes.

Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
2026-04-30 21:32:45 +00:00
nesquena-hermes e68f74ac99 fix(approval): close SSE notify-ordering, head-fidelity, and trailing-approval gaps (Opus MUST-FIX A/C/D)
Pre-release Opus review caught three correctness bugs in the original
PR #1350 SSE wiring beyond the snapshot/subscribe race:

A) **Notify-ordering race (MUST-FIX A):** _approval_sse_notify took _lock
   only for the subscriber-list snapshot, then released it before
   put_nowait. With two parallel submit_pending calls, T2's notify
   could fire before T1's, leaving the UI showing pending_count=1 while
   the server actually had 2 queued.

C) **Trailing approval lost (MUST-FIX C):** _handle_approval_respond
   never called _approval_sse_notify after popping. With parallel
   tool-call approvals (#527), a second approval queued behind the one
   being responded to was invisible until the next event ever fired —
   in practice, the agent thread parked on it would appear hung.

D) **Payload showed tail not head (MUST-FIX D):** payload built from
   the just-appended entry instead of queue[0]. /api/approval/pending
   returns the head; SSE returned the tail. Diverging contracts.

Fix:
- Split into _approval_sse_notify_locked (caller holds _lock, no
  internal locking) and _approval_sse_notify (convenience wrapper).
- submit_pending: call _locked variant inside the queue-mutation lock,
  passing queue_list[0] as head.
- _handle_approval_respond: call _locked variant inside the pop lock,
  passing the new head (or None/0 if queue is empty).
- Restore fallback poll to 1500ms (was bumped to 3000ms; degraded-mode
  parity with v0.50.247 is more important than save 1.5s of polling).

New regression tests in tests/test_pr1350_sse_notify_correctness.py:
- test_second_submit_pending_sends_head_not_tail (D)
- test_respond_to_first_pushes_second_as_new_head (C)
- test_respond_to_only_pending_pushes_empty_state (C edge)
- test_pending_count_is_monotonic_under_contention (A)

Updated test_approval_sse.py to pin the new contract:
- _approval_sse_notify_locked(session_key, head, total)
- 1500ms fallback interval

Total: 3411 tests passing.

Co-authored-by: jasonjcwu <jasonjcwu@users.noreply.github.com>
2026-04-30 18:45:15 +00:00
fxd-jason 932694aec6 feat(approval): SSE long-connection for real-time approval notifications (#1350)
Replaces the 1.5s HTTP polling loop with a Server-Sent Events endpoint
at /api/approval/stream that pushes approval events to the browser
instantly. The backend uses a thread-safe subscriber registry
(_approval_sse_subscribers) with bounded queues to prevent memory
leaks from slow clients. Frontend uses EventSource with automatic
fallback to 3s HTTP polling on SSE error.

- Backend: subscribe/unsubscribe/notify lifecycle in api/routes.py
- New route: GET /api/approval/stream?session_id=
- submit_pending() now calls _approval_sse_notify() after queue append
- Frontend: EventSource with onerror -> _startApprovalFallbackPoll()
- 30s keepalive comments, _CLIENT_DISCONNECT_ERRORS handling
- 42 new tests (static analysis + unit + concurrency)

Co-authored-by: jasonjcwu <jasonjcwu@users.noreply.github.com>
2026-04-30 18:31:42 +00:00
nesquena-hermes 3f838fc31a release: v0.50.244 (#1308)
release: v0.50.244

Batch release of 4 PRs:

- #1303 (@fecolinhares) — TTS playback of agent responses via Web Speech API.
  Per-message speaker button + auto-read toggle + voice/rate/pitch in
  Settings. localStorage-only state. Closes #499.

- #1304 — Stale saved session 404 cleanup + structured api() errors.
  Salvaged from #1084. Independently approved on 358275e.

- #1306 — Cmd/Ctrl+K works while a conversation is busy.
  Salvaged from #1084. Independently approved on 2e8a239.

- #1307 — Sienna skin (warm clay & sand earth palette).
  Salvaged from #1084. Independently approved on 5cd79c8.

Tests: 3290 passed, 2 skipped, 3 xpassed, 0 failures (was 3254; +36 tests).

Independently reviewed and approved by nesquena (commit 47f0e0d). End-to-end
trace verified the TTS flow; security audit confirmed SpeechSynthesisUtterance
is plain-text-only with no XSS surface; behavioural harness confirmed
_stripForTTS handles all 12 markdown-stripping cases; bounds clamping on
rate/pitch verified; opt-in behavior verified.
2026-04-29 21:34:27 -07:00
Hermes Agent 4ee80425f2 Merge remote-tracking branch 'refs/remotes/pr/1229' into stage/batch-v0.50.238 2026-04-29 15:17:57 +00:00
Frank Song eb9614854e Refine embedded terminal card entrypoint 2026-04-29 04:37:31 +00:00
Andy b0aed07fe0 fix: keep clarify countdown steady 2026-04-29 04:32:52 +00:00
Andy 47e91ee84b fix: handle clarify review edge cases 2026-04-29 04:32:52 +00:00
Andy 9fabd12e41 fix: preserve clarify drafts on timeout 2026-04-29 04:32:40 +00:00
starship-s 9d5480565f fix: remove deprecated btnCancel; localise composer tooltips with disabled reason branching
- Drop btnCancel element and all JS show/hide call sites across
  boot.js, messages.js, sessions.js, ui.js (superseded by single
  primary action button)
- Remove .cancel-btn CSS rules including mobile media-query override
- Route updateSendBtn() title/aria-label through t() with English
  fallbacks; add composer_send/queue/interrupt/steer/stop keys to all
  7 locales (en, ru, es, de, zh, zh-Hant, ko)
- Branch disabled-state tooltip on reason: clarify lock, compression
  running, or idle-empty, each with its own i18n key
- Update test_sprint10 / test_sprint36 to reflect single-button model:
  assert btnSend present and id="btnCancel" absent; replace
  test_hides_cancel_button with test_clears_composer_status
2026-04-29 04:31:55 +00:00
Frank Song f384368ee2 fix(sessions): preserve unread dots after compression 2026-04-29 04:31:14 +00:00
Frank Song 6b04ae0254 Fix unread markers for local inflight completions 2026-04-29 04:31:13 +00:00
Frank Song b488a0d1b0 Fix unread markers for background completions 2026-04-29 04:31:13 +00:00
Frank Song 5d16ff7522 Fix background completion unread markers 2026-04-29 04:31:13 +00:00
yzp12138 f35d7786e5 fix: send image uploads as native multimodal inputs 2026-04-28 23:18:51 +08:00
nesquena-hermes 24b1e6f3fc fix+feat: batch v0.50.236 — OAuth providers fix, profile switch UX, YOLO mode (#1211)
fix+feat: batch v0.50.236 — OAuth providers fix, profile switch UX, YOLO mode (#1211)

Merges PRs #1208, #1209, #1210 (#1152 rebased):

- fix(providers): OAuth provider cards show correct Configured status in Settings.
  get_providers() was discarding has_key=True from _provider_has_key() for OAuth
  providers, hiding config.yaml tokens. Also fixed filter excluding all OAuth providers
  from the Settings panel. Surfaces auth_error string. (closes #1202)

- ux(profiles): profile chip shows spinner and new name immediately on switch.
  Optimistic name update + .switching CSS class + chip disabled + finally cleanup.
  populateModelDropdown() and loadWorkspaceList() now parallelized via Promise.all.

- feat: YOLO mode toggle — skip all approvals per session.
  /yolo slash command, "Skip all this session" button on approval cards,
  amber  pill indicator in composer footer. Session-scoped, in-memory.
  Full i18n: en, ru, es, de, zh, ko, zh-Hant. (closes #467)
  Original author: @bergeouss (PR #1152)

Tests: 2837 passed (+50 new tests vs previous release)
QA harness: 20/20 passed + all browser API checks passed
2026-04-27 22:56:12 -07:00
nesquena-hermes a091be6a8e fix: batch v0.50.229 — session perf, ephemeral sessions, iOS zoom (#1183)
Merged as v0.50.229. 2678 tests passing. Browser QA 21/21.

All three PRs were independently reviewed and approved by @nesquena with reviewer commits pulled in:
- #1181 (#1158): `d974388` (stale-response race in _loadOlderMessages)
- #1182: `7e20006` (full-scan fallback path consistency)
- #1180: `a5ad154` (regression test for iOS zoom threshold)

Thanks @jasonjcwu (#1158)!
2026-04-27 16:27:03 -07:00
nesquena-hermes 8b8ff3328a fix: batch triage — 12 contributor PRs (v0.50.227) (#1168)
Merged as v0.50.227. 2634 tests passing, browser QA 21/21 (desktop + mobile). Full attribution below.

Thanks to all 12 contributors:
@jundev0001 (#1138), @franksong2702 (#1142, #1157, #1162), @dso2ng (#1143), @bergeouss (#1145, #1146, #1156, #1159), @jasonjcwu (#1149), @ccqqlo (#1161), @frap129 (#1165)

Two fixes applied during integration and two more by the independent reviewer (@nesquena):
- messages.js: per-turn cost delta capture order (#1159)
- workspace.py: symlink target blocked-roots check + HOME sanity guard (#1149, #1165)
- panels.js: cron unread counter bookkeeping (in-loop increment bug)
- tests/test_symlink_cycle_detection.py: register workspace before session/new
2026-04-27 13:34:59 -07:00
nesquena-hermes 58ad315dca v0.50.216: compression chains, renderer fixes, HTML preview, approval z-index, /steer fix, reasoning chip (#1075)
* fix(workspace): add .html/.htm to MIME_MAP so HTML preview renders correctly

MIME_MAP was missing entries for .html and .htm. The server fell back to
Content-Type: application/octet-stream, which browsers refuse to render as
HTML in an iframe — causing a blank white preview.

The rest of the pipeline was already correct: the iframe exists in
static/index.html, openFile() in static/workspace.js routes .html to
showPreview('html'), and _handle_file_raw() in api/routes.py sets the
correct CSP sandbox header when ?inline=1 is present. The only missing
piece was the MIME type.

* test(workspace): lock in MIME_MAP entry for .html/.htm

PR #1070 added .html/.htm → text/html to MIME_MAP in api/config.py
to fix the blank workspace HTML preview iframe. Without a direct
assertion on the MIME_MAP entries, the fix could silently regress
(the existing test_779_html_preview.py tests cover the iframe wiring,
the inline=1 query handling, and the CSP sandbox header — but none of
them touch MIME_MAP itself).

Add a single regression test that asserts MIME_MAP['.html'] and
MIME_MAP['.htm'] are both 'text/html' so any future removal of those
entries fails CI immediately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(composer): raise .approval-card.visible z-index above .queue-card

.queue-card has z-index:2. .approval-card.visible had no z-index, so the
queue flyout would render on top of the approval card when both were visible
simultaneously — obscuring the Allow/Deny buttons.

Fix: add z-index:3 to .approval-card.visible so approvals always render
above the queue flyout. Approval is a blocking, security-relevant interaction
and must never be obscured by passive UI elements.

* test(composer): pin approval-card z-index > queue-card invariant

PR #1071 raises .approval-card.visible to z-index:3 so the security-
relevant Allow / Deny buttons stay clickable when the queue flyout is
also open. Without a regression test, a future CSS edit could silently
drop the z-index back below queue-card (z-index:2) and reintroduce the
bug — there is no automated UI test covering this stacking interaction.

Add a focused regex check that pins the invariant:
.approval-card.visible z-index must be strictly greater than
.queue-card z-index.

Modeled on the existing CSS-regex regression style in
tests/test_mobile_layout.py (test_profile_dropdown_not_clipped_by_overflow).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: intercept /steer /interrupt /queue before busy-mode routing in send()

Root cause: slash commands entered while the agent is busy never reached
the command dispatcher. send() enters the busy block and returns early at
line ~50, so the slash-command intercept (~line 56) is never reached.
The text was queued as a plain message. When it drained after the turn
ended, cmdSteer / cmdInterrupt ran on an idle session, saw no active stream,
and showed "No active task to stop."

Fix: at the top of the busy block, before checking busyMode, check if the
text starts with / and is one of the three control commands. If so, dispatch
the handler immediately and return. This lets the user type /steer, /interrupt,
or /queue at any time — including while the agent is mid-stream — and have
them execute against the live session.

Two new regression tests added:
- test_slash_commands_intercepted_before_busymode_routing: verifies the
  intercept appears before the busyMode routing in the busy block
- test_steer_intercept_calls_handler_directly: verifies the intercept calls
  _bc.fn(_pc.args) and returns, not queues

* test(busy-intercept): pin sync input-clear before await in slash intercept

PR #1072's intercept clears the msg input before awaiting the handler.
Order matters: if the await happens first (or if the clear is moved
inside the handler), the input still shows '/steer foo' for the duration
of the await. A reflexive second Enter press during that window — common
while waiting for the toast — re-runs send(): either re-fires the
handler (double-steer) or, if the turn just ended, falls through to the
non-busy slash dispatcher and drops a confusing "No active task to stop."

Add test_steer_intercept_clears_input_before_await pinning the order so
this UX invariant cannot silently regress.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: update steer i18n and settings copy — steer no longer interrupts

With the real /steer implementation (agent.steer() via /api/chat/steer),
steer injects a correction mid-turn WITHOUT interrupting the current stream.
The previous copy said "falls back to interrupt", "Steer (interrupt + send)",
etc. — accurate only for the old placeholder, not the real implementation.

Changes across all 6 locales (en/ru/es/de/zh/zh-Hant):
  cmd_steer:                  "falls back to interrupt" removed
  settings_busy_input_mode_steer: "interrupt + send" → "mid-turn correction"
  cmd_steer_fallback:         "interrupted" → "queued for next turn"
  busy_steer_fallback:        "interrupted instead" → "queued for next turn"
  settings_desc_busy_input_mode: "currently falls back to interrupt" removed

Also:
  static/index.html: inline fallback text updated to match
  static/commands.js: internal comment clarified (fallback = queue+cancel,
                      not "interrupt mode" which implies the primary action)

* fix(renderer): group consecutive blockquote lines into single element

Root cause: the old rule `s.replace(/^> (.+)$/gm, ...)` had three bugs:
  1. `.+` required at least one character — bare `>` lines (blank
     continuation lines) did not match and passed through as literal `>`
  2. Each matching line became its own `<blockquote>` element — a 10-line
     blockquote produced 10 stacked `<blockquote>` tags with no grouping
  3. When a fenced code block sat inside a blockquote, the fence-stash
     pass consumed the code content and left orphaned `>` lines that the
     old `.+` pattern could not match

Fix: replace the single-line regex with a group-based approach that matches
one or more consecutive `>` lines as a single block, strips the `>` prefix
from each line, passes each non-empty line through inlineMd(), turns blank
`>` lines into `<br>`, and wraps the entire group in one `<blockquote>`.

14 regression tests added covering:
- Single-line blockquotes (regression)
- Multi-line grouping (2 and 10 lines)
- Two separate blockquotes staying separate
- Bare `>` and `>text` (no space) edge cases
- Blank continuation lines → <br>
- Bold / italic / inline-code inside blockquotes
- Blockquote followed by normal paragraph

* fix(renderer): drop empty trailing line from blockquote match

The new group-based blockquote rule introduced in this PR captures the
trailing newline in its (?:\n|$) clause. After block.split('\n') that
trailing newline produces an empty final element. The original filter
only dropped lone bare '>' artifacts on the last line, so the empty
final element survived, and the .map(blank → '<br>') step turned it
into a phantom <br> immediately before </blockquote>.

Visible symptom: any blockquote whose source ends with \n (the common
case — a quote followed by another paragraph or end-of-message) renders
with an extra blank line at the bottom of the quote.

Reproducer:
  '> Hello\n\nThe rest of the message.'
    → '<blockquote>Hello\n<br></blockquote>\nThe rest of the message.'
                          ^^^ phantom <br>

Fix: replace the single-line filter with a while-loop that pops trailing
lines while they are either empty OR a bare '>'. This matches the
intent the Python test mirror in tests/test_blockquote_rendering.py
already had (the mirror was correct; the JS was not — that's why
the original tests passed despite the bug).

Also add four new regression tests in TestNoPhantomTrailingBr that pin
the no-trailing-<br> invariant for the common shapes:
  - input ending with \n
  - quote followed by paragraph (the real-world case)
  - multi-line quote ending with \n
  - quote with blank continuation + trailing \n (internal <br> stays,
    trailing <br> does not)

Verified end-to-end with node against the actual JS regex.
244 renderer-adjacent tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(renderer): comprehensive markdown fixes — strikethrough, task lists, CRLF, nested blockquotes

Five additional fixes on top of the blockquote grouping from the initial commit:

1. CRLF normalisation: strip \r\n → \n at start of renderMd so Windows
   line endings do not produce stray \r characters in rendered output

2. Strikethrough: ~~text~~ → <del>text</del> in both inlineMd() (for use
   inside blockquotes/lists) and the outer pass (for plain paragraphs).
   Added <del> to SAFE_TAGS and SAFE_INLINE so it is not HTML-escaped.

3. Task lists: - [x] / - [ ] items in unordered lists render as /☐
   via task-done/task-todo span wrappers. Checks [X] (uppercase) too.

4. Nested blockquotes: >> / >>> etc. now recurse so each level gets its
   own <blockquote> element rather than passing through as literal >.
   Implemented by extracting the blockquote rule into _applyBlockquotes()
   which calls itself recursively on the stripped inner content.

5. Lists inside blockquotes: > - item now renders <ul><li> inside the
   blockquote instead of a literal "- item" string. Task list items work
   inside blockquotes too (> - [x] done →  inside <blockquote><ul>).

Also fixed test_issue342.py search window (5000→10000 chars) — the CRLF
strip at the top of renderMd pushed the autolink regex past the old limit.

68 new tests in test_renderer_comprehensive.py + test_blockquote_rendering.py
covering all constructs, edge cases, and combinations.

* fix(renderer): restore space in blockquote prefix-strip regex

Commit 04e7b53 changed the blockquote prefix-strip regex from
  /^>[ \t]?/   (consume "> ", "\t>", or just ">")
to
  /^>[\t]?/    (only consume "\t>" or just ">")

The space character was dropped from the character class. Since
practically every blockquote an LLM produces is "> " (greater-than
followed by a space), this leaves a leading space artifact on every
stripped blockquote line. Worse, the leading space breaks the
list-detection regex `^(?:  )?[-*+] ` inside the new `_applyBlockquotes`
helper — that regex requires either zero or two leading spaces, never
one — so the new "list inside blockquote" feature never fired for
the canonical input shape `> - item`.

Reproducer (against the actual ui.js via node, before the fix):
  > Hello world         → <blockquote> Hello world</blockquote>
                                       ^ phantom leading space
  > Steps:              → <blockquote>Steps:
  > - one                  - one
  > - two                  - two</blockquote>
                          ^ literal text, NOT a <ul>; lists-in-quote feature broken
  > - [x] done          → blockquote with literal "[x] done", no checkbox span

Tests passed despite the bug because tests/test_blockquote_rendering.py
and tests/test_renderer_comprehensive.py validate against a Python
mirror (`_apply_blockquotes`) whose strip regex is `^>[ \t]?` — i.e.
the mirror is correct, the JS is not, and the static-mirror tests
can't catch the divergence. Same shape of bug as commit 94d63d0
(phantom <br> in trailing line) where the mirror was right and the JS
was wrong.

Fix: restore the space character in the strip regex's character class.

Add tests/test_renderer_js_behaviour.py — 11 tests that drive the
ACTUAL renderMd via node and assert on rendered output for the most
common LLM shapes (single-line quote, multi-line quote, list inside
quote, task list inside quote, nested >>>, strikethrough inside and
outside quote, top-level task list, quote followed by heading,
multi-paragraph quote with list, CRLF normalisation).

Verified: the buggy regex makes 6 of those 11 tests fail; the corrected
regex makes all 11 pass.

Suite: 2354 passed, 0 new failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Collapse agent session compression chains

* Restore upstream changelog entries

* fix(agent_sessions): bubble active compression chains to top by tip last_activity

The original PR merge kept the chain head's id/title/started_at and overrode
id/model/message_count/ended_at/end_reason from the tip — but did NOT override
last_activity. Since the projected list is sorted by last_activity DESC and
the WebUI sidebar surfaces updated_at = last_activity, an actively-used
compression chain whose tip is being edited NOW would sort by the ROOT's
old last_activity and fall below recently touched standalone sessions.

Reproducer (with the harness against actual code, before the fix):
  - root: started 30 days ago, last msg 30 days ago
  - tip:  started 28 days ago (parent_session_id=root), last msg 5 seconds ago
  - standalone: last msg 2 days ago

  Sidebar order with original PR:
    [0] standalone  (48h ago)
    [1] active_tip  (last_activity=root's 720h ago)  ← wrong

  Sidebar order after fix:
    [0] active_tip  (last_activity=tip's 0h ago)     ← correct
    [1] standalone  (48h ago)

This matches Hermes Agent's own list_sessions_rich projection at
hermes_state.py:903-909, which overrides "last_active" from the tip
exactly so that the agent CLI's session list orders the same way.

Add ``last_activity`` to the merge-from-tip key list, update the existing
test_compression_chain_collapses_to_latest_tip_in_sidebar assertion to
expect tip-derived updated_at, and add
test_compression_chain_bubbles_to_top_by_tip_activity locking in the
bubble-to-top invariant — without this regression test the previous
behaviour passed CI because no test exercised the sort order against a
mixed set of chains and standalone sessions.

The chain head's started_at (created_at) and title remain preserved, so
users can still find the conversation by its original date and name.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: v0.50.216 release notes and version bump

Compression chains, renderer fixes, HTML preview, approval z-index, /steer fix.

* chore: gitignore local-only review harness directory

Adds .local-review/ to .gitignore so renderer drivers, sample inputs,
fixture builders, and other reviewer scratch files do not accidentally
get committed. Nothing under that path is ever shared in the repo;
keeping the entry tracked makes the boundary explicit for any future
contributor who creates the directory locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Keep reasoning chip visible for None effort

* test(reasoning): pin chip render output via node, not just source regex

The PR's static checks in test_reasoning_chip_btw_fixes.py validate the
shape of _applyReasoningChip (no display='none' literal, the right
classList.toggle call exists, the right label literals are in the
function body) but pass even if the runtime detail is wrong — for
example if `inactive` were inverted, _normalizeReasoningEffort
mishandled whitespace, or _formatReasoningEffortLabel returned the
wrong literal for an unknown input.

Add tests/test_reasoning_chip_js_behaviour.py — 11 tests that drive
the actual _applyReasoningChip() via node and assert on the rendered
DOM state for each effort value:

  TestChipAlwaysVisible
    - empty / null  -> "Default" label, inactive=true
    - "none"        -> "None" label, inactive=true
    - "low"/"high"  -> verbatim label, inactive=false
  TestNormalizationEdgeCases
    - "NONE"        -> normalises to "None"
    - "  none  "    -> trims and normalises
    - unknown junk  -> falls through visible, never hidden
  TestTitleAttributeAccessibility
    - title attribute carries the human-readable label for tooltip /
      screen-reader use

Sanity-checked against master's pre-fix ui.js: 11/11 fail (bug caught).
Against this PR's ui.js: 11/11 pass.

This pattern (drive the actual JS via node) caught two regex-only
regressions in PR #1073 where the Python mirror was correct while the
JS was broken. Same protection added here so the chip-visibility
contract can't silently break in a future refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: add #1074 to v0.50.216 changelog, bump test count to 2428

* fix(i18n): restore broken Unicode in Russian and Spanish steer strings

Commit 56c7a14 (fix: update steer i18n and settings copy) accidentally
stripped the `\u` prefix from Unicode escape sequences in two locales,
producing garbled literal hex strings visible to users:

  Spanish (es):
    - cmd_steer:                   correcci00f3n  → corrección
    - cmd_steer_fallback:          2014 en cola   → — en cola
    - busy_steer_fallback:         2014 en cola   → — en cola
    - settings_desc_busy_input_mode: qu00e9, est00e1, correcci00f3n → qué, está, corrección
    - settings_busy_input_mode_steer: correcci00f3n  → corrección

  Russian (ru):
    - settings_desc_busy_input_mode: the entire Cyrillic string was
      replaced with raw 4-hex-char code-points without the \u prefix
      (041e043f... instead of actual Cyrillic). Decoded:
      "Определяет поведение при отправке сообщения во время работы
      агента. Очередь ждёт; Прерывание отменяет и начинает заново;
      Steer внедряет коррекцию без прерывания."

Fix: write the correct characters directly (UTF-8 is the file encoding
so embedding them literally is cleaner than \u escapes for long text).

All other locales (en, de, zh, zh-Hant) were not affected — confirmed
by grepping for bare hex run-ons in the updated file.

Verified: node --check static/i18n.js passes; full pytest suite green
(2365 passed, 47 skipped).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: remove duplicate compression chain entry from [Unreleased]

---------

Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
Co-authored-by: Nathan Esquenazi <nesquena@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Frank Song <franksong2702@gmail.com>
2026-04-25 21:06:31 -07:00
nesquena-hermes 3d96dc1498 v0.50.215: real /steer via agent.steer() — mid-turn correction without interrupt (#1069)
Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
Co-authored-by: nesquena <nesquena@users.noreply.github.com>
2026-04-25 19:21:00 -07:00
nesquena-hermes 520034c071 v0.50.214: busy input modes + queue/interrupt/steer slash commands (#1067)
* feat: busy input modes with queue/interrupt/steer slash commands

- Add busy_input_mode setting (queue/interrupt/steer) to config defaults
- Add /queue, /interrupt, /steer slash commands with handlers
- Modify send() to respect busy_input_mode (interrupt cancels and resends, steer falls back to interrupt with toast, queue preserves existing behavior)
- Add settings dropdown in settings panel with load/save/apply wiring
- Initialize window._busyInputMode at boot and on settings save
- Add 17 i18n keys across all 6 locale blocks (en/ru/es/de/zh/zh-Hant)
Addresses #720

* test: 17 regression tests for busy_input_mode + slash commands

PR description noted manual testing only. Added structural tests
matching the pattern used by recent contributor PRs (#1010, #1011,
#1018, #1022, #1058) so future refactors don't silently regress
the wiring:

  Backend (api/config.py):
    - default 'queue' is set in _DEFAULT_SETTINGS
    - enum validator restricts to {queue, interrupt, steer}

  Slash commands (static/commands.js):
    - /queue, /interrupt, /steer all registered with correct fns
    - /interrupt and /steer set noEcho:true (the queued payload
      becomes the visible turn, not the slash invocation)
    - cmdQueue requires S.busy
    - cmdInterrupt + cmdSteer call queueSessionMessage before
      cancelStream (otherwise the drain has nothing to pick up)

  send() busy branch (static/messages.js):
    - reads window._busyInputMode
    - calls cancelStream on interrupt/steer
    - queues before cancelling (ordering invariant)

  Boot init + panels.js wiring (static/boot.js, static/panels.js):
    - both success and fallback paths set window._busyInputMode
    - load/save/apply path threads busy_input_mode through

  i18n (static/i18n.js):
    - all 17 new keys present in each of the 6 locale blocks

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: add noEcho:true to /queue; clear pendingFiles in all three slash handlers

1. /queue was missing noEcho:true — the dispatcher would echo the raw slash text
   as a user bubble, then the drain would send the queued message, causing a
   double-bubble in the conversation (#840 pattern).

2. cmdQueue, cmdInterrupt, and cmdSteer all captured S.pendingFiles into the queue
   payload but never cleared S.pendingFiles or called renderTray(). Staged files
   would remain in the tray and be re-attached on the next send(), duplicating
   attachments. Fix: add S.pendingFiles=[];renderTray() after updateQueueBadge().

3. test_all_three_busy_commands_are_no_echo: expanded to cover /queue (was only
   interrupt + steer), now documents that all three must set noEcho:true.

4. test_slash_commands_clear_pending_files: new test that all three handlers clear
   S.pendingFiles and call renderTray() after enqueuing.

Co-authored-by: bergeouss <bergeouss@users.noreply.github.com>

* docs: v0.50.214 release notes and version bump

---------

Co-authored-by: bergeouss <bergeouss@users.noreply.github.com>
Co-authored-by: Nathan Esquenazi <nesquena@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
2026-04-25 18:51:06 -07:00
nesquena-hermes 3ce7844a7a feat(queue): Codex-style message queue flyout above composer (#1040)
* chore: apply pending #965 queue flyout patches on local master

Queue flyout implementation (PR #965 — pending merge) applied on top of
upstream v0.50.205. Features:
- Queue card slides up from behind composer (approval-card pattern)
- Lucide icons via li(), CSS class system, no inline SVG dumps
- Drag-to-reorder by _queued_at timestamp (survives re-renders)
- Inline contenteditable edit with focus guard and blur-commit
- Combine preserves first item files, merge immediate (no 200ms race)
- Files/model compact badges per item
- Hide/expand via header chevron + composer pill + titlebar chip
- All 3 expand paths sync correctly
- border-bottom CSS order fixed, fingerprint improved, _dragTs guards

CF CSP domains also applied (deployment-specific, not in upstream PR).

* fix(queue): harden merge closure, toggleQueue sid, and drain flash

- mergeBtn _doMerge now reads live queue (_getSessionQueue) instead of stale closure q
- toggleQueue reads activeSid from S.session at call time, not captured param
- updateQueueBadge defers chips.innerHTML='' by 360ms so slide-out transition completes before content clears

* style(queue): contain:paint on inner, pill fade-in animation

* feat(queue): pill outside composer, compact collapsed state matching card width

- Move #queuePill out of .composer-box to between .composer-flyout and .composer-box
- Pill styled as compact queue-card-inner (same border, radius:14px 14px 0 0, no border-bottom)
- Pill width matches card inner: max-width:calc(var(--msg-max)-40px), centered
- Pill stays visible until user re-expands or queue drains (updateQueueBadge no longer
  hides pill when card is manually collapsed)
- Remove all queue-active/queue-pill-active composer modifications — composer untouched
- Fix: mergeBtn reads live queue not stale closure
- Fix: toggleQueue uses S.session.session_id at call time not captured param
- Fix: chips.innerHTML deferred 360ms on drain to avoid empty-card flash

* fix(queue): collapsed state persists + cross-session DOM isolation

- Add _queueCollapsed[sid] flag: set by hideBtn, cleared by pill expand / queue drain
- _renderQueueChips respects flag — no longer reopens card when new message queued while collapsed
- updateQueueBadge else-branch: DOM mutations now gated on sid===active session
- _syncQueueTitlebar only fires for active session in else-branch
- Fixes Opus/Codex-identified bugs: pill auto-reopen and cross-session DOM corruption

* fix(queue): proper pill wrapper matching queue-card structure

- Add .queue-pill-outer div wrapper (max-width:var(--msg-max); padding:0 20px)
  identical to .queue-card outer — positions pill button at exact card-inner width
- .queue-pill button fills slot with width:100%
- Removes hardcoded 740px — width is derived correctly from the same CSS variables
  the card uses, scales with --msg-max across all viewports
- JS toggles .show on pillOuter (parentElement), not on pill button directly

---------

Co-authored-by: Basit Mustafa <basit.mustafa@gmail.com>
2026-04-25 14:21:50 -07:00
nesquena-hermes ad8e10304c v0.50.207: batch of 10 PRs — TPS stat, SSE guard, session polish, cron UX, folder create, model errors, session speed, title gen (#1031)
* fix: remove orphaned i18n keys from top-level LOCALES object

Three Traditional Chinese translation keys (cmd_status, memory_saved,
profile_delete_title) were placed outside any locale block between the
en and ru blocks in static/i18n.js. They became top-level properties
of the LOCALES object, causing them to appear as invalid language
options in the Settings > Preferences dropdown.

The correct translations already exist in the zh-Hant locale block.

Fixes #1008

* fix: block stale SSE events from polluting new session's DOM

- appendThinking(): guard with !S.session||!S.activeStreamId to drop
  events from a previous session's SSE stream during a session switch
- appendLiveToolCard(): same guard for consistency
- finalizeThinkingCard(): scroll thinking-card-body to top when
  scroll is pinned, so completed response is immediately visible
- appendThinking(): auto-scroll thinking card body to bottom while
  streaming if user is watching (scroll pinned)

* Fix empty agent sessions in sidebar

* fix: resolve cron UI UX issues — icon ambiguity, toast overlap, running status

Fixes #995 — three sub-issues in the Cron Jobs UI:

1. Dual play icons ambiguous: Resume button now shows a distinct
   play+bar icon (play triangle + vertical line) instead of the
   identical triangle used by Run now.

2. Toast notification overlapping header buttons: Added
   position:relative; z-index:10 to .main-view-header so it
   stacks above the fixed toast (z-index:100 within its layer).

3. No running status after trigger: After triggering a job, the
   status badge immediately shows 'running…' with a CSS spinner
   animation, and polls the cron list every 3s (up to 30s) to
   refresh when the job completes.

- Added cron_status_running i18n key in all 5 locales (en, es, de, ru, zh, zh-Hant)
- Added .detail-badge.running CSS class with spinner animation
- New functions: _setCronDetailStatus(), _startCronRunningPoll()

* fix(#1011): address review feedback — poll cleanup, badge persistence, 30s fallback

- _clearCronDetail() now clears _cronRunningPoll interval on navigation
- Poll re-applies 'running' badge after loadCrons() re-render (prevents flicker)
- When poll ends (30s max), detail re-renders with actual status as fallback

* feat: create folder and add space directly from UI (#782)

- After creating a folder via the file tree New folder button, offer to add it as a space via confirm dialog
- Add Create folder if it doesnt exist checkbox in the New Space form
- Backend: support create flag in /api/workspaces/add to mkdir before validation
- i18n: 4 new keys (folder_add_as_space_title/msg/btn, workspace_auto_create_folder) in all 6 locales

* fix: validate workspace path before mkdir to prevent orphan directories

Review feedback (critical): the previous code called mkdir() before
validate_workspace_to_add(), which meant a rejected path (e.g. system dir)
would leave an orphan directory on disk.

New flow:
1. Resolve path and check against blocked system roots BEFORE any mutation
2. mkdir() only if path passes the blocklist check
3. Full validation (exists, is_dir) after mkdir

Also imports _workspace_blocked_roots for the pre-mutation blocklist check.

* fix(#1014): classify model-not-found errors with helpful message

- Add model_not_found error type to streaming.py exception classifier
- Detect 404, 'not found', 'does not exist', 'invalid model' patterns
- Strip HTML tags from provider error messages (nginx 404 pages, etc.)
- Add model_not_found branch to apperror handler in messages.js
- Add i18n key model_not_found_label in all 6 locales
- 15 tests covering detection, sanitization, frontend, and i18n

* feat(ui): add live TPS stat to header

Adds a TPS (Tokens Per Second) chip to the right of the header title bar
that updates live while AI output is streaming.

Metering (api/metering.py)
- Tracks per-session output + reasoning tokens via GlobalMeter singleton
- Per-session TPS = total_tokens / elapsed_time
- Global TPS = average of active sessions' TPS values
- HIGH/LOW are max/min of global_tps snapshots over a 60-minute rolling
  window (only recorded when > 0, so idle periods are excluded)
- Thread-safe with a single lock

Metering events emitted from streaming.py
- Throttled at 100ms from token/reasoning/tool callbacks so the display
  updates rapidly during fast token streams
- 1Hz ticker as fallback for slow streams (exits when no active sessions)
- Final stats emitted on stream end

Routes (api/routes.py)
- Removed POST /api/metering/interval endpoint (dynamic interval via
  focus/blur was replaced with simple always-1s-when-active approach)

UI (static/messages.js, index.html, style.css)
- TPS chip in titlebar: shows 'N.N t/s . N.N high . N.N low'
- Default: '0.0 t/s . 0.0 high' when idle
- Display updates on every metering SSE event (throttled to 100ms)

* feat: session restore speed + title gen reasoning hardening (#1025, #1026)

PR #1025 (@franksong2702): Speed up large session restore paths
- GET /api/session?messages=0 now parses only metadata before the messages array
- Metadata-only loads no longer populate the full-session LRU cache
- Frontend lazy fetch uses resolve_model=0 to avoid cold model-catalog lookup
- Hard reload no longer waits for populateModelDropdown() before restoring session

PR #1026 (@franksong2702): Harden auto title generation for reasoning models
- Raises title-gen completion budget to 512 tokens (reasoning-safe)
- Retries once with 1024 tokens on empty content / finish_reason:length
- Applies retry to both auxiliary and active-agent fallback routes
- Preserves underlying failure reason in title_status on local fallback

Co-authored-by: Frank Song <franksong2702@gmail.com>

* feat: session attention indicators in right slot + last_message_at timestamps (#1024)

PR #1024 (@franksong2702): Polish session attention indicators

- Streaming spinners and unread dots now reuse the right-side actions slot
- Running/unread rows hide timestamps; idle/read rows keep right-aligned timestamps
- Date group carets point down when expanded, right when collapsed
- Pinned group no longer repeats pinned-star icon per row
- Running indicators appear immediately after send (local busy state while /api/sessions catches up)
- Sidebar sorting/grouping/timestamps now prefer last_message_at (derived from last real message)
  so metadata-only saves don't make old sessions appear under Today

Co-authored-by: Frank Song <franksong2702@gmail.com>

* docs: v0.50.207 release notes — 10 PRs, 2169 tests (+36)

---------

Co-authored-by: bergeouss <bergeouss@users.noreply.github.com>
Co-authored-by: Josh <josh@fyul.link>
Co-authored-by: Frank Song <franksong2702@gmail.com>
Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
2026-04-25 13:07:35 -07:00
nesquena-hermes 12a8c051fb fix: inject full workspace path into agent context for uploaded files (#997)
fix: inject full workspace path into agent context for uploaded files (#997)

Uploaded files (drag-and-drop or paperclip) were saved correctly to the workspace
but the agent message only contained the bare filename — `photo.jpg` instead of the
full path. The agent couldn't call `read_file` or `vision_analyze` without a full path.

`uploadPendingFiles()` now returns `{name, path}` objects from `/api/upload`
(`data.path` was always returned, just never threaded through). The agent message
gets the full absolute path; all display surfaces (badges, session history, INFLIGHT
state, POST body) continue showing only the bare filename.

Three fixes absorbed during review:
- Second `saveInflightState()` call was passing raw `{name,path}` objects instead
  of the `uploadedNames` string array (INFLIGHT localStorage corruption on page reload)
- `attachLiveStream()` was being called with the raw object array; changed to pass
  `uploadedNames` so the `done` handler receives strings, not objects
- `attachLiveStream` `done` handler referenced `uploadedNames` which is out of scope
  there (ReferenceError on every upload success); fixed to use the `uploaded` param

Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
Closes #996
2026-04-24 23:09:44 -07:00
Basit Mustafa e62338d3a0 fix(queue): drain correct session queue after cross-session stream completion (#964)
When a session finishes streaming while the user has switched to a different
session, setBusy(false) was draining S.session.session_id (the currently
*viewed* session) instead of the session that actually finished. Queued
follow-up messages were silently dropped.

Root cause: setBusy() has no context about which session triggered it.
The activeSid closure variable inside attachLiveStream() knew the right
session but was not propagated.

Fix: add _queueDrainSid module global (null by default). Stream done and
error handlers set it to activeSid immediately before calling setBusy(false).
setBusy(false) reads and clears _queueDrainSid, falling back to S.session if
it is unset (the common case where the user hasn't switched away).

Handlers patched: done event, start-call error handler, stream_end/stream_stop
reconnection fallback, and max-retry error exit.

Co-authored with Claude Sonnet 4.6 / Anthropic.
2026-04-24 12:33:56 -07:00
Basit Mustafa a4b56642d9 perf(streaming): throttle inflight localStorage persist to prevent GC crash (#972)
saveInflightState() is called from syncInflightAssistantMessage() on every
token. It does localStorage.getItem + JSON.parse + mutate + JSON.stringify +
localStorage.setItem on the full inflight state map. For a 5000-token response
with a 10KB messages array this produces ~36MB of JSON churn per second.

This O(response_length) work per token is the primary source of GC pressure
that causes the renderer to crash (Chrome error codes 4/5). The 13.6-second
RunTask we observed in perf traces is a direct consequence: accumulated rAF
callbacks execute all at once after each multi-second GC pause.

Fix: add _throttledPersist() which writes at most once every 2 seconds during
token streaming. State transitions that matter for crash recovery (tool events,
done, start) still call persistInflightState() directly, so at most 2s of
in-flight progress is lost if the tab crashes mid-stream.

The _persistTimer is cleared on 'done' so the final state is always flushed.

Co-authored with Claude Sonnet 4.6 / Anthropic.
2026-04-24 12:33:16 -07:00
nesquena-hermes 86b20d362f fix(streaming): call clearTimeout at all _pendingRafHandle cleanup sites (#985)
_scheduleRender() now uses setTimeout(→rAF) when within the 66ms throttle
window, meaning _pendingRafHandle can hold a setTimeout ID (not a rAF ID).
All 4 cleanup sites only called cancelAnimationFrame(), which is a no-op for
timeout handles, leaving stale callbacks that could fire after stream end.
Fix: call both clearTimeout() and cancelAnimationFrame() at each site.
(clearTimeout is a no-op when called with a rAF handle, and vice versa.)

Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
2026-04-24 11:57:48 -07:00
Basit Mustafa 0217bf5cce perf(streaming): throttle live render to ~15fps to prevent crash under GC pressure (#966)
_scheduleRender() uses requestAnimationFrame to update the live assistant
message during streaming. rAF fires at up to 60fps, but each DOM update
takes 50-150ms on sessions with long histories — far exceeding the 16ms
rAF budget.

During GC pauses (which can run for hundreds of milliseconds), rAF
callbacks accumulate. When the GC yields, the browser executes all
queued callbacks sequentially in a single RunTask. A Chrome performance
trace shows a 13.6-second RunTask containing 1,240 accumulated render
callbacks — which causes the renderer to crash (Chrome error codes 4/5,
ERR_EMPTY_RESPONSE / ERR_CONNECTION_RESET).

Fix: track the last render timestamp and delay scheduling the next rAF
until at least 66ms (15fps) have elapsed since the previous render.
If within the 66ms window, use setTimeout to defer the rAF rather than
skipping it — this batches token updates without dropping any content.

The 66ms interval is conservative enough to prevent runaway accumulation
while fast enough that streaming text still feels immediate. The _renderPending
flag continues to prevent double-scheduling within each interval.

Co-authored with Claude Sonnet 4.6 / Anthropic.
2026-04-24 11:44:47 -07:00
bsgdigital a2d7f311be fix(streaming): prevent dropped characters in incremental smd path (#960)
Detect prefix desync between current display text and already-streamed text, then rebuild the streaming-markdown parser from full content to avoid character loss during live rendering. Add regression assertions for the new desync guard.

Made-with: Cursor

Co-authored-by: bsgdigital <bsg@bsgdigital.com>
2026-04-24 11:04:32 -07:00
bsgdigital e5cf9c5910 fix(streaming): strip malformed DSML function_calls tags (#958)
Handle DeepSeek DSML variants including truncated and spaced tag forms, and sanitize thinking-card text so leaked XML fragments never render. Add regression tests for DSML edge cases and thinking-card sanitization.

Made-with: Cursor

Co-authored-by: bsgdigital <bsg@bsgdigital.com>
2026-04-24 11:04:16 -07:00
bergeouss 23e9070fc5 fix(btw): use correct SSE endpoint /api/chat/stream (#950)
The /btw command was completely non-functional because attachBtwStream()
connected to /api/stream which doesn't exist — the server SSE handler
lives at /api/chat/stream. This caused an immediate 404 on every /btw
request.

Closes #945

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 10:43:44 -07:00
nesquena-hermes 061af78cde v0.50.185: /btw stream hardening + .venv bootstrap + /reasoning toast (#935 #939 #941 #942)
* fix(bootstrap): discover .venv layout in agent_dir (closes #938) (#941)

* fix(btw): harden _streamDone flag — defensive ordering + session guard + stream_end coverage (#935)

* fix(btw): align /reasoning toast prefix with BRAIN const (#939)

* docs: v0.50.185 release notes, update test counts to 2107

---------

Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
2026-04-23 23:25:45 -07:00
nesquena-hermes 1a9dba7844 fix: reasoning chip dropdown visible + monochrome SVG icon + /btw answer preserved (closes #933) (#934)
* fix: reasoning chip dropdown visible + SVG icon + /btw answer no longer wiped (closes #933)

* fix(ui): resize handler symmetry + lock regressions for PR #934 fixes

Two small additions on top of the core PR:

1. Resize handler now re-positions the reasoning dropdown when the window
   resizes while it's open, matching the existing model-dropdown branch.
   Without this, resizing while the dropdown is open leaves it aligned to
   the pre-resize chip position — fine in practice (most resizes close the
   dropdown via the global click handler) but inconsistent with the
   model-dropdown sibling.

2. Regression test file tests/test_reasoning_chip_btw_fixes.py with 10
   tests locking all four fixes in place so they can't silently regress:

   - Dropdown sits OUTSIDE .composer-left (so overflow-y: hidden can't clip it)
   - Dropdown is grouped with the other composer-level dropdowns
   - Chip button contains stroke="currentColor" SVG (not a 🧠 emoji)
   - _applyReasoningChip() body doesn't include 🧠
   - cmdReasoning calls _applyReasoningChip(eff) directly with the
     server-confirmed effort, not syncReasoningChip() (stale cache)
   - _streamDone flag declared, set in done handler, checked in onerror
   - _ensureBtwRow() called in done handler (creates bubble when no tokens arrive)
   - resize handler re-positions composerReasoningDropdown

Full suite: 2056 passed, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
Co-authored-by: Nathan Esquenazi <nesquena@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:18:51 -07:00
nesquena-hermes 9c69b646ff feat(commands): /background, /btw slash commands + undo button + reasoning chip
Rebased onto master after #931 (aux title routing) to resolve streaming.py conflict.
All changes from both PRs are cleanly integrated.

2088 tests passing (2065 master + 23 from #931).

Co-authored-by: bergeouss <bergeouss@gmail.com>
2026-04-24 01:24:51 +00:00
Nathan Esquenazi b563484a56 fix(smd): strip javascript:/data:/vbscript: URLs — smd does not sanitize schemes
streaming-markdown@0.2.15 preserves arbitrary URL schemes in href/src.
Verified with a Node + jsdom harness:

  IN : [click](javascript:alert(1))
  OUT: <p><a href="javascript:alert(1">click</a>)</p>        ← XSS vector

Confirmed unsafe for: javascript:, vbscript:, data:text/html, file://.
The library uses only safe DOM primitives (createElement/appendChild/
createTextNode — no innerHTML/eval), so <script> tags are escaped as
text, but URL-scheme filtering is absent. The existing renderMd() path
implicitly filtered to http(s) via its regex, so this is a regression
the moment streaming markdown is enabled.

Attack path: agent echoes prompt-injection content containing a
markdown link with javascript: href → smd renders it live → user clicks
during the streaming window → JS executes in webui origin → session
cookie, API calls, etc.

Fix: walk the live DOM after each parser_write (and again after
parser_end) and remove href/src attributes whose scheme isn't on the
safe allowlist (http, https, mailto, tel, and relative/anchor paths).
Blocked anchors keep their text content but lose href; blocked images
lose src and get data-blocked-scheme="1" for debugging.

Harness confirms all 10 tested cases behave correctly — javascript:,
vbscript:, data:text/html, file:// all stripped; https://, /path,
#anchor, mailto:, tel: all preserved.

Added 5 regression tests in TestSmdUrlSchemeSanitization that lock:
  - the sanitize helper exists
  - the allowlist regex permits https? and forbids javascript/vbscript/data:
  - _smdWrite invokes sanitize after parser_write
  - _smdEndParser invokes sanitize after parser_end
  - the sanitizer covers both <a href> and <img src>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 16:28:40 -07:00
nesquena-hermes 89b0c8eb41 feat: incremental streaming markdown via streaming-markdown (v0.50.180, #917)
Co-authored-by: bsgdigital
2026-04-23 23:09:08 +00:00
nesquena-hermes 5082f426f2 fix: correct interleaved streaming order (Text → Thinking → Tool → Text) (#913)
* fix: correct interleaved streaming order (Text → Thinking → Tool → Text)

During live streaming, tool cards were inserted before their associated
thinking cards instead of after them. The root cause was that
appendLiveToolCard's anchor selector didn't include .thinking-card-row,
so finalized thinking cards were skipped when finding the insertion point.

Changes:
- messages.js: Add segment splitting (segmentStart/_freshSegment) so each
  text segment after a tool call renders only its own slice, not the full
  accumulated text. Sync thinking card render in reasoning handler to
  avoid rAF race with tool events. Guard removeThinking() to preserve
  finalized cards when reasoningText is active.
- ui.js: Add .thinking-card-row to appendLiveToolCard anchor selector so
  tool cards land after finalized thinking. Add anchor-based positioning
  to appendThinking for correct interleaved placement. Clean up empty
  spinner-only thinking rows in finalizeThinkingCard. Add 3-dot waiting
  indicator (toolRunningRow) after tool cards for visual feedback.
- style.css: Scope blinking cursor to last live-assistant segment only.
  Add spacing for toolRunningRow.

* chore: CHANGELOG for v0.50.174

---------

Co-authored-by: bsgdigital <bsgdigital@users.noreply.github.com>
Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
2026-04-23 13:23:43 -07:00
nesquena-hermes b82954ee70 feat(ui): session attention indicators — streaming spinner, unread dot, timestamps (#856)
Closes #856. Co-authored-by: Frank Song <138988108+franksong2702@users.noreply.github.com>
Reviewed-by: nesquena (709bd37 — test isolation fix also included)
2026-04-23 09:05:57 -07:00
nesquena-hermes d39d30a213 fix: correct message ordering after task cancellation — v0.50.163 (#883)
fix: correct message ordering after task cancellation — v0.50.163 (#883)

Fixes the message-ordering glitch from #882: clicking Cancel while the
agent is responding could cause a subsequent response to render above
the "*Task cancelled.*" marker.

Root cause: the cancel handler pushed the marker only to local S.messages
without persisting to the server. When the done event fired shortly after
and replaced S.messages from server state, the marker disappeared from
client state while the next response anchored to the server-authoritative
position.

Fix has three parts:
- Server (cancel_stream): append *Task cancelled.* to session.messages
  with _error:True + timestamp, then save. _error ensures
  _sanitize_messages_for_api() strips it from conversation_history on
  the next agent turn, so the LLM never sees it as a prior assistant
  turn. Precedent: same flag used for the apperror marker at line 1343.
- Client (SSE cancel handler): fetch /api/session instead of pushing
  locally (same pattern as the done handler). Falls back to local push
  if the fetch fails.
- Tests: fix test window width for cancel handler (1200→dynamic); add
  two regression tests pinning _error flag and _sanitize invariant.

1941 tests passing.

Co-authored-by: piliang <piliang1@jd.com>
2026-04-22 22:17:40 -07:00
nesquena-hermes 558b1730a6 fix: thinking card no longer mirrors main response — v0.50.154 (closes #852)
Remove early return in _streamDisplay() bypassing think-block stripping when reasoningText populated.
2026-04-22 20:21:42 +00:00
nesquena-hermes db57c47ff3 fix(ui): slash command input now echoed as user message in chat (closes #840)
* fix(ui): echo slash command input as user message in chat (#840)

Slash commands like /skills, /help, /status previously showed only the
assistant response with no user message above it — the conversation
appeared to start from nowhere.

Fix: executeCommand() now returns {noEcho:bool} instead of true/false
(returns null when no command matched). send() in messages.js pushes a
user message bubble before returning when noEcho is false.

Commands with noEcho:true are action-only and don't get echoed:
/clear, /new, /stop, /retry, /undo, /voice, /model, /workspace,
/theme, /usage, /reasoning.

Commands without noEcho (get echoed):
/help, /skills, /status, /title, /compress, /compact, /personality.

16 new tests in test_issue840_slash_echo.py.

* fix(ui): push user message BEFORE running slash handler (ordering bug)

The PR as originally written pushed the user message AFTER the slash
command handler ran.  That works correctly for async handlers (the
assistant response lands later, after the user push) but breaks for
sync handlers like cmdHelp which push their assistant response
synchronously:

  S.messages = [assistant response, user "/help"]   ← reverse order

The chat would render the help content ABOVE the user's own "/help"
input — not what the issue asked for.

Fix: look up the command inline, push the user message first (for
echo-worthy commands), then run the handler.  If the handler opts out
(returns false — e.g. /reasoning <level>), pop the user message back
off so the normal send path can add it cleanly when forwarding to the
agent.

Renamed the flow so it's clear we're not calling executeCommand twice
(my first attempt did that by accident).  executeCommand() stays as a
public API returning null or {noEcho:bool} — just isn't the only path
send() uses now.

Added 2 regression tests:

- test_send_pushes_user_message_before_running_handler: asserts
  the user push appears before the handler invocation in source order.
- test_send_rolls_back_user_push_on_handler_optout: asserts the
  S.messages.pop() for the opt-out case.

Also tightened the existing `test_send_checks_noecho_flag` and
`test_send_pushes_user_message_for_echo_commands` tests to look at
the new `_cmd.noEcho` pattern inline (vs the original
`cmdResult.noEcho`).  Removed `test_send_uses_null_check_not_truthy`
(obsoleted — the control flow no longer stores the executeCommand
return in a variable).

Full suite: 1767 passed, 0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): compress/compact noEcho + title/personality confirmation messages

Applied Opus mentor review fixes:
- compress and compact: add noEcho:true (S.messages reset internally causes
  user bubble to flicker/disappear without noEcho)
- /title <name>: push assistant confirmation message after rename succeeds
- /personality <name>: push assistant confirmation message after set succeeds
- 4 new regression tests covering the above invariants

---------

Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
Co-authored-by: Nathan Esquenazi <nesquena@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 23:08:24 -07:00
nesquena-hermes 859602340e fix: streaming race conditions (#631) + blank-page workspace binding (#804)
Closes #631. Closes #804.

Bug A (thinking card below answer / double render / stuck cursor): trailing rAF after 'done'
inserted a duplicate live-turn wrapper into already-settled DOM. Fixed via _streamFinalized flag
+ cancelAnimationFrame in all terminal handlers (done/apperror/cancel/_handleStreamError) +
_scheduleRender guard. All three reported symptoms were the same root cause.

Bug B (accumulator reset): original fix reset assistantText/reasoningText inside _wireSSE on reconnect.
Reverted — server uses one-shot queue.Queue(), no replay on reconnect, reset would wipe valid
pre-drop content causing data loss. Bug A fix alone resolves all symptoms.

#804 (blank page workspace): syncWorkspaceDisplays uses S._profileDefaultWorkspace as fallback;
workspace chip enabled when hasWorkspace (not hasSession); promptNewFile/promptNewFolder/
switchToWorkspace/promptWorkspacePath auto-create session on blank page; boot.js hydrates
_profileDefaultWorkspace from /api/settings before any session exists.

Opus max-effort review + Nathan independent review + full browser QA. 1765/1765 tests.
2026-04-21 18:47:40 -07:00
nesquena-hermes 811424a87b feat(reasoning): full /reasoning CLI parity — show|hide + effort levels via config.yaml (#812)
Closes #461

Adds full /reasoning CLI parity to the WebUI slash command system:

- /reasoning show|on → window._showThinking = true; writes display.show_reasoning to config.yaml (same key as CLI); mirrors to settings.json for boot.js
- /reasoning hide|off → same in reverse; re-renders immediately
- /reasoning none|minimal|low|medium|high|xhigh → POST /api/reasoning → writes agent.reasoning_effort to config.yaml; takes effect next turn (matching CLI semantics)
- /reasoning (no args) → GET /api/reasoning → live status toast from config.yaml
- Autocomplete shows all 8 options: show|hide|none|minimal|low|medium|high|xhigh
- Profile-isolated: _get_config_path() is thread-local so per-profile settings never bleed across
- Boot hydration: window._showThinking initialised from settings.json show_thinking on page load
- Inspect.signature guard in streaming.py so older hermes-agent builds don't TypeError

28 new tests, 1708/1708 total passing. Full browser QA on port 8789 with isolated state. CLI/config.yaml sync verified with hermes_constants.parse_reasoning_effort().
2026-04-21 15:26:52 -07:00
nesquena-hermes 765d8520d4 fix(streaming): quota error detection, error persistence, stream_end session_id fix (#767)
- quota_exhausted error type: distinguishes credit exhaustion from rate limits
- Streaming errors persisted to session file so they survive page reload
- _error flag excludes persisted errors from subsequent LLM API calls
- stream_end and title SSE events use original session_id (not s.session_id which rotates during context compaction)

Fixes #739, #652, #653
2026-04-20 22:48:19 +00:00
nesquena-hermes 7f16a41a31 fix: normalize stale session models after provider switch — v0.50.99 (#751)
## Summary

Rebased-on-behalf of @likawa3b (originally PR #748 — stale base).

Sessions can outlive provider changes. When an old session still points to a model from a previous provider (e.g. `gemini-3.1-pro-preview` after switching the agent to OpenAI Codex), starting a chat hits the wrong backend and fails silently.

This PR adds a lightweight normalization pass:
- `_normalize_provider_id()` maps common prefixes to canonical provider IDs
- `_resolve_compatible_session_model()` checks the session model's provider against `active_provider` and returns the default model if they differ
- `_normalize_session_model_in_place()` is called at GET `/api/session` — corrects and persists stale models once
- Chat start also normalizes via `_resolve_compatible_session_model()` and returns `effective_model` in the response
- `messages.js` applies `effective_model` back to the UI/localStorage/dropdown if set

Closes #748

## Tests

1498 passed (2 pre-existing ordering failures unrelated to this PR; 5 new tests added in `test_provider_mismatch.py`).

**Original author:** @likawa3b
2026-04-19 23:22:26 -07:00
nesquena-hermes 877a32f49c fix: XML tool-call leak + workspace empty-state + notification text — v0.50.92 (PR #712)
Strips <function_calls> XML from assistant messages before rendering, adds workspace file panel empty-state messages, and changes notification description from 'tab' to 'app'. 16 new tests. Fixes #702, #703, #704.
2026-04-19 05:40:37 +00:00
nesquena-hermes b49de92893 feat(/compress): manual session compression with focus topic — closes #469 (PR #619 by @franksong2702)
POST /api/session/compress with optional focus_topic. Transcript-inline cards: command, running, complete (collapsible green), reference. /compact alias kept. Fixes: var(--green) undefined color, focus_topic 500-char cap. Independent review by @nesquena (4 passes).
2026-04-18 06:55:04 +00:00
nesquena-hermes bded1cf906 fix(streaming): strip Gemma 4 thinking token delimiter in all paths — closes #607
Fixes <|turn|>thinking delimiter (was wrong as <|turn>thinking) in api/streaming.py, static/messages.js, and static/ui.js. Adds 13 regression tests. Independent review by @nesquena.
2026-04-18 06:45:39 +00:00
Aron Prins 9a3dc10d93 feat: redesign chat transcript + fix streaming/persistence lifecycle — v0.50.70 (PR #587 by @aronprins)
Redesign chat transcript + fix streaming/persistence lifecycle — v0.50.70

Squash-merges PR #587 by @aronprins (Aron Prins). Full credit to @aronprins for all feature and fix work.

Transcript redesign: unified --msg-rail/--msg-max CSS variables, user turns as tinted cards, thinking cards as bordered panels, error card treatment, day-change separators, composer fade.

Approval/clarify as composer flyouts: cards slide up from behind composer top, overflow:hidden + translateY clip prevents travel visibility, focus({preventScroll:true}).

Streaming lifecycle: DOM order user→thinking→tool cards→response, no mid-stream jump. Live tool cards inserted before [data-live-assistant].

Persistence: reasoning attached before s.save(), _restore_reasoning_metadata on reload, role=tool rows preserved in S.messages, CLI-session tool-result fallback.

Workspace panel FOUC fix: [data-workspace-panel] set at parse time.

Docs: docs/ui-ux/index.html + two-stage-proposal.html.

Maintainer additions (433b867): CHANGELOG v0.50.70, version badge, usage badge loop simplification.

Reviewed and approved by @nesquena (independent review). 1361 tests passing.
2026-04-16 14:04:42 -07:00