Commit Graph

601 Commits

Author SHA1 Message Date
nesquena-hermes f2194f13cd Stage 320: PR #1860 — request wedge diagnostics by @franksong2702 2026-05-08 15:37:08 +00:00
Frank Song 7e2709e281 fix: add request wedge diagnostics 2026-05-08 15:37:08 +00:00
Frank Song 6808e06083 fix: isolate profile quota usage probes 2026-05-08 15:37:07 +00:00
nesquena-hermes a11cbd3ee9 Stage 319: PR #1862 — preserve local custom provider model ids by @franksong2702 2026-05-08 15:16:18 +00:00
Frank Song 414c474d97 fix: preserve local custom provider model ids 2026-05-08 15:16:18 +00:00
nesquena-hermes 1105d496e9 Stage 319: PR #1887 — cross-container gateway liveness via state-file freshness fallback by @Sanjays2402 2026-05-08 15:15:50 +00:00
Sanjay Santhanam efcfff3d7f fix(#1879): cross-container gateway liveness via state-file freshness
The dashboard banner 'Hermes agent is not responding' fires on every
multi-container deployment that doesn't set 'pid: "service:hermes-agent"'
in compose, because get_running_pid() relies on fcntl.flock and
os.kill(pid, 0) — both PID-namespace-scoped and invisible across container
boundaries.

Fix: when get_running_pid() returns None, fall back to a freshness check on
gateway_state.json. The gateway already writes that file on every tick with
gateway_state == 'running' and an aware ISO-8601 updated_at timestamp, so a
recent (<= 120s) timestamp is an equivalent live-process signal that needs
only a shared volume — no PID namespace, no compose workaround, no extra
HTTP probe URL.

Behavior preserved:
- In-namespace deployments still hit the PID-based path first; payload shape
  unchanged (no 'reason' key) so #716 contract holds.
- Cross-container alive path adds reason='cross_container_freshness' so
  support diagnostics can tell which signal succeeded.
- Stale updated_at, non-running gateway_state, malformed/naive/missing
  timestamps, and timestamps far in the future all still report 'down' — the
  fallback never produces a false positive.
- Same redaction rules: argv/command/executable/env/raw pid never leak.

Tests: 15 new cases in test_issue1879_cross_container_gateway_liveness.py
covering the cross-container alive path, every refusal case, clock-skew
tolerance, and backward compat with the #716 PID path. Existing #716
heartbeat tests (8) continue to pass.
2026-05-08 15:15:50 +00:00
Sanjay Santhanam a958c29373 fix(config): phantom Custom group when active provider is ai-gateway (#1881)
Two bugs in get_available_models() conspired to duplicate the active
provider's auto-detected models under a phantom 'Custom' group whenever
custom_providers was also declared in config.yaml:

1. custom:* PIDs not in _named_custom_groups (e.g. stale slugs left from
   prior configs) fell through to the auto_detected_models fallback, copying
   the active provider's whole catalog into a phantom Custom: <slug> group.
   Fix: continue unconditionally for ANY custom:* PID — the named-group
   branch is the only legitimate population path.

2. The bare 'custom' PID, with the active provider being concrete (e.g.
   ai-gateway), hit 'elif auto_detected_models: copy.deepcopy(...)' and
   built a duplicate Custom group of the active provider's models with
   mismatched provider prefixes. Fix: when pid == 'custom' and the active
   provider is non-custom, leave models_for_group empty.

The reporter also suggested a third fix gating resolve_model_provider() on
config_provider — that's intentionally NOT applied because it conflicts with
the long-standing model-specific-override semantics covered by
test_model_resolver.py::test_custom_provider_*_routes_to_named_custom_provider
(custom_providers entries explicitly override the active provider's routing
when the user opted-in). The reporter's symptom (duplicate UI group) lives
entirely in get_available_models()'s group construction and is fully fixed
by the two changes above.

Tests: 6 new regression tests (3 in #1881 file + reuse), 774 broader
tests still green (model/provider/custom/config domain).
2026-05-08 15:15:49 +00:00
nesquena-hermes 0dcce8e434 Stage 318: PR #1859 — fix: persist generated title refresh marker by @ai-ag2026 2026-05-08 15:01:48 +00:00
ai-ag2026 755c18bdf9 fix: persist generated title refresh marker 2026-05-08 01:36:10 +02:00
ai-ag2026 f69a81c8c3 fix: preserve pending turn during stale cleanup 2026-05-07 23:57:01 +02:00
ChaseFlorell d8612ba323 fix: add cdn.jsdelivr.net to CSP connect-src to allow xterm source map fetches
Closes #1850

Co-authored-by: Chase Florell <ChaseFlorell@users.noreply.github.com>
2026-05-07 20:42:55 +00:00
hermes-agent 5f6a55185c stage-315 absorb: document handle_kanban_* three-valued return contract
Per Opus pre-release verdict on PR #1843: the four handle_kanban_*
entry points declare '-> bool' but actually return True | None | False
(after PR #1843 made the False-vs-None distinction load-bearing for
the caller's '_kanban_unknown_endpoint' decision). Update the type
annotations to 'bool | None' and add a docstring on handle_kanban_get
(with cross-references on the three siblings) so a future contributor
adding a new return path doesn't accidentally produce a 0/'' value
that would silently revert the double-404 fix.

Test-only verification: kanban tests pass (49/49). Production behavior
unchanged. Cheap defensive cleanup per Nathan's standing absorb-in-release
default for ≤20-LOC documentation/type-annotation fixes.
2026-05-07 18:52:01 +00:00
Michael Lam 78c09e1fd9 fix: keep shell route errors html 2026-05-07 18:41:13 +00:00
nesquena-hermes 740e5412a5 Stage 315: PR #1838 — show auto-compression running state by @Michaelyklam 2026-05-07 18:41:13 +00:00
Michael Lam e31b7e72d6 fix: show auto-compression running state 2026-05-07 18:41:13 +00:00
Nathan Esquenazi f3b56d8793 fix(kanban): avoid double 404 when bridge already sent error response
PR #1837's new `_kanban_unknown_endpoint` wrapper was triggered for any
falsy bridge return — but `handle_kanban_*` returns `None` (not `True`)
when an inner handler calls `bad(...)` to send an error response. The
wrapper then sent a SECOND 404 on top of the bridge's response, producing
concatenated JSON bodies on the wire.

Concrete reproducer (caught by behavioural harness, not the merged tests):

    GET /api/kanban/tasks/<missing-id>/log
    →  '{"error":"task not found"}{"error":"unknown Kanban endpoint: GET ..."}'

This affected every `bad(...)`-shaped error path in the bridge:
- task-not-found returns from `_task_log_payload` / `_task_detail_payload`
- exception handlers for ImportError (503), LookupError (404),
  ValueError (400), RuntimeError (409) across all four method handlers
- the `_handle_events_sse_stream` board-resolution failure path

The fix: distinguish an explicit `False` (truly unmatched path) from
`None` (handled, response already sent). Only `False` should trigger
the unknown-endpoint diagnostic.

Adds a regression test that exercises the task-not-found path through
`routes.handle_get` and asserts only one JSON body is on the wire.

Follow-on to #1837 (already merged into master at v0.51.20).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 11:35:57 -07:00
hermes-agent 0ed63968b6 Stage 314: PR #1827 — sync Codex provider card models with picker by @Michaelyklam
Note: PR #1827 was branched before v0.51.19 shipped #1812, which
introduced an initial (pure live-fetch) Codex provider card hook in
api/providers.py at the same line range. The contributor's PR was
filed AFTER #1812 shipped but their diff didn't yet account for it.
Stage 314 absorbs the contributor's intent (visible Codex cache
merge for gpt-5.3-codex-spark visibility) by replacing the v0.51.19
hook with the richer merged version directly in stage. Production
code change ≡ what the contributor's PR would have produced if
rebased onto current master. Test file + pr-media adopted verbatim.
Marker commit so the stage log makes the absorption visible.
2026-05-07 17:58:52 +00:00
Michael Lam bb75707331 fix: surface stale Kanban client recovery 2026-05-07 17:57:09 +00:00
hermes-agent 1f702c7569 stage-313 absorb: gate _resolve_configured_provider_id alias resolution + harden bootstrap test isolation
Two in-stage fixes for v0.51.19 batch:

1) api/config.py — add resolve_alias=False param to
   _resolve_configured_provider_id() and pass it from
   resolve_model_provider(). The PR #1818 swap from
   _resolve_provider_alias() to _resolve_configured_provider_id()
   was correct for active-provider/badge surfaces but broke #1625's
   local-server-provider literal-preservation contract: 'ollama' →
   'custom' and 'lm-studio' → 'lmstudio' alias-collapse caused
   _LOCAL_SERVER_PROVIDERS membership check to miss, breaking the
   model-id full-path preservation for LM Studio/Ollama. The new
   flag preserves the raw provider value when called from
   resolve_model_provider, and named-custom-slug + base-url
   fallback both still run unchanged.

2) tests/test_bootstrap_discover_agent.py — pin Path.home() in
   _isolate_discover_agent_dir so the hard-coded
   'Path.home() / .hermes / hermes-agent' / 'Path.home() /
   hermes-agent' candidates in discover_agent_dir() can't pick up
   the dev machine's real install. The original PR #1817 isolation
   helper covered HERMES_HOME, HERMES_WEBUI_AGENT_DIR, and
   REPO_ROOT but missed the Path.home() leak.

Both surfaced on full pytest pre-release gate, fixed in stage,
ship in v0.51.19. Tests: full suite green.
2026-05-07 17:07:48 +00:00
Frank Song 8bc2677691 fix: repair file picker and html preview interactions 2026-05-07 16:59:00 +00:00
ai-ag2026 ae22a80238 fix: hide workspace metadata in user bubbles 2026-05-07 16:58:59 +00:00
ai-ag2026 7d5704c3bc fix: keep cross-surface session continuations visible 2026-05-07 16:58:39 +00:00
nesquena-hermes 5e01b00b8b Stage 313: PR #1809 — dedupe workspace-prefixed user turns after compaction by @ai-ag2026 2026-05-07 16:58:16 +00:00
ai-ag2026 256866ace6 fix: dedupe workspace-prefixed user turns after compaction 2026-05-07 16:58:16 +00:00
Frank Song f7902776d4 fix: use live Codex models in providers card 2026-05-07 16:58:16 +00:00
nesquena-hermes db7b72596e Stage 313: PR #1805 — provider account quota cards by @franksong2702 2026-05-07 16:58:15 +00:00
nesquena-hermes 6ab384618a Stage 313: PR #1818 — named custom provider routing by @franksong2702 2026-05-07 16:56:49 +00:00
Michael Lam 1192a0a766 fix: preserve inaccessible workspace entries 2026-05-07 16:56:48 +00:00
Frank Song 3ac89c2696 fix: route named custom provider model selections 2026-05-07 21:40:23 +08:00
Frank Song a6b88c8c1e feat: show account limits in provider quota 2026-05-07 17:36:04 +08:00
Michael Lam 048f1fa24e fix: keep assistant-only stream deltas on current turn 2026-05-07 06:25:16 +00:00
Sanjay Santhanam 064d14c85b fix(config): custom provider + :free/:beta/:thinking suffix mis-resolution (#1776)
PR #1762 fixed the rsplit grammar collision for plain @openrouter:model:free
qualifiers, but skipped the fallback whenever the provider hint started with
'custom:' on the assumption that custom providers route directly. That left
'@custom:my-key:some-model:free' broken: rsplit yields
provider='custom:my-key:some-model', bare='free' → custom guard skips the
split-fallback → returns provider='custom:my-key:some-model', model='free'.

Detect the over-split structurally instead of using a known-suffix allowlist:
custom hints carry exactly one segment after 'custom:' (constructed at
api/config.py:1363 as 'custom:' + entry_name). So any rsplit result of
'custom:<a>:<b>' with bare model '<c>' has eaten one model segment — peel
it back with a second rsplit and prepend it to the bare model.

This is robust for :free / :beta / :thinking / :preview / any future
OpenRouter suffix without an allowlist to maintain.

Adds 5 regression tests covering the matrix (free/beta/thinking/preview/
slashed-model). All 7 existing #1744 tests still pass; #1228 tests
unaffected.

Co-authored-by: Cake <51058514+Sanjays2402@users.noreply.github.com>
2026-05-07 06:25:16 +00:00
fxd-jason a80b7695d8 fix(kanban): update stale read-only docstring + board_exists early-out in board counts
The bridge module docstring still described the API as 'deliberately
read-only' but it now exposes full CRUD (tasks, boards, comments,
links, SSE). Updated to list the supported operations.

For _board_counts_for_slug (the hot path for the board-switcher badge),
added a board_exists() early-out that mirrors the agent's own helper
in plugin_api.py (path.exists() before connect()). This avoids a
redundant init_db()+connect() schema pass per board per list refresh.
connect() already handles auto-init for fresh databases via its
needs_init check, so the extra init_db was unnecessary overhead on
the hot path that scales linearly with board count.

Tests:
- test_board_counts_returns_empty_for_nonexistent_board: verifies the
  early-out (no connect() call, returns {})
- test_board_counts_returns_real_counts_for_populated_board: verifies
  actual per-status counts are returned for existing boards
2026-05-07 03:58:16 +00:00
Michael Lam 0bd65ef0bf fix: preserve CLI session tool metadata 2026-05-07 02:47:19 +00:00
Frank Song 91f99d8194 fix(oauth): serialize Anthropic env fallback reads 2026-05-07 02:47:19 +00:00
Michael Lam 2d20842450 fix: surface Codex usage exhaustion errors 2026-05-07 01:39:52 +00:00
nesquena-hermes f77a44fce2 feat(ux): three high-leverage context-menu essentials from #1764
Issue #1764 asked for a much larger surface (Reveal + Copy-path on
every UI surface that references a file path, plus Rename in session
menus). Per Nathan's curation we ship only the three highest-leverage
pieces in this PR — they cover the three concrete user-visible
frictions Cygnus reported, and leave the broader sweep for follow-up.

## 1. Copy file path in workspace tree right-click menu

The tree's right-click already had Rename and Reveal in File Manager.
Reveal is slow when the user just wants the path string for a
terminal/editor — and there was no Copy-path action anywhere.

Added "Copy file path" between Reveal and Delete. It POSTs to a new
`/api/file/path` endpoint that resolves the relative tree-rooted path
into the absolute on-disk path (the frontend can't compute it because
only the server knows the workspace root) and writes the result to
the OS clipboard via `navigator.clipboard.writeText()`. Falls back to
the legacy execCommand pattern on browsers where the modern Clipboard
API is gated.

The new endpoint deliberately does NOT require the target to exist:
copy-path on a recently-deleted file is still useful (paste into a
terminal to investigate). `safe_resolve` continues to gate path
traversal — the test suite pins this with a `../../../../../etc/passwd`
attempt that 400s.

## 2. Rename in session three-dot menu

Cygnus's specific ask: double-click rename in the sidebar is timing-
sensitive — the first click frequently registers as "open the chat"
before the second click arrives, so users open the conversation when
they meant to rename it. Putting Rename in the menu eliminates the
timing entirely.

Added Rename as the FIRST item in `_openSessionActionMenu` (above
Pin). It reuses the existing `startRename` closure attached to each
session row — no duplicated state, no second API call out of band
with the double-click path. Mechanism: the row builder now stores
`el._startRename = startRename` and `el.dataset.sid = s.session_id`,
so the menu can find the row by data-sid and call its closure
directly. This keeps all the `_renamingSid`/`oldTitle`/`applyTitle`
bookkeeping single-sourced.

Read-only imported sessions skip the menu item via the same
`_isReadOnlySession` gate the closure already uses.

## 3. Reveal-failed toast includes the resolved server-side path

Cygnus posted a screenshot of a "Failed to reveal: not found" toast
that dropped the path entirely. Without it the user can't tell which
file the system expected — useful when a stale session row still
references a deleted file.

Server-side fix in `_handle_file_reveal`: instead of returning
`bad(handler, "File not found", 404)`, return
`bad(handler, f"File not found: {target}", 404)` where target is the
resolved absolute path. Frontend toast also defends against err with
no .message: `(err.message||err)` instead of `err.message` alone.

Verified live: a missing-file reveal now produces:

    Failed to reveal: File not found: /home/hermes/workspace/missing-xyz.txt

Cygnus's exact diagnostic-friction is gone.

## Tests

* tests/test_1764_context_menu_essentials.py (new)
  - 13 source-level pinning tests
  - 6 live HTTP behaviour tests against the conftest test server

* tests/test_1466_sidebar_cancel_clarify.py
  - Two assertion-window bumps (3200→4400, 3600→4800) to accommodate
    the new Rename action prepended to _openSessionActionMenu. The
    test relied on a fixed-byte-window function-body slice — comments
    added explaining why the bumps were needed.

* All 9 locales got translations for the 5 new keys
  (copy_file_path, path_copied, path_copy_failed, session_rename,
  session_rename_desc) — locale parity tests pass.

## Verification

Full pytest suite: 4671 passed, 2 skipped, 3 xpassed (matches
pre-change baseline).

Live browser verification on port 8789:
- Right-click .git folder in workspace tree → menu shows
  Rename / Reveal in File Manager / Copy file path / Delete (red).
- Click Copy file path → clipboard gets "/home/hermes/workspace/.git",
  toast confirms "File path copied to clipboard".
- Open session three-dot menu → Rename conversation appears first
  with pencil icon, followed by Pin / Move / Archive / Duplicate /
  Delete in the same order as before.
- Trigger reveal on a non-existent file → toast reads
  "Failed to reveal: File not found: /home/hermes/workspace/<filename>".
  The resolved server-side path is now visible in the failure.

Refs nesquena/hermes-webui#1764.
2026-05-07 01:39:52 +00:00
Michael Lam 1fc8e83c90 fix: use spawn for manual cron subprocesses 2026-05-07 01:39:51 +00:00
bergeouss 9711070119 fix: resolve rsplit collision for OpenRouter models with :free/:beta/:thinking suffixes (#1744)
The previous approach of prepending 'openrouter/' to the model ID in the
catalog was incorrect — it only masked the symptom while regressing the
config_provider=openrouter codepath.

The root cause is in resolve_model_provider(): rsplit(':', 1) on
'@openrouter:tencent/hy3-preview:free' yields provider='openrouter:tencent/hy3-preview'
and model='free', because the ':free' suffix collides with the @provider:model
grammar.

Fix: after rsplit, validate that the extracted provider hint is a known
provider (in _PROVIDER_MODELS, _PROVIDER_DISPLAY, or starts with 'custom:').
If not, fall back to split(':', 1) so trailing suffixes stay attached to
the model ID.

This fixes all current and future OR models with colon-suffixed tags
(:free, :beta, :thinking, :nitro, etc.) without catalog changes.

Also adds regression tests for the affected models and edge cases.

Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
2026-05-07 01:39:51 +00:00
bergeouss ca1a268512 fix: add missing openrouter/ prefix for tencent/hy3-preview:free model (#1744) 2026-05-07 01:39:51 +00:00
test 74edc38aac Stage 308: PR #1757 — fix: gateway status card shows not running when no platforms connected by @skspade 2026-05-06 22:02:51 +00:00
test 54c9fb48dd Stage 308: PR #1756 — fix: isolate profile cookie per webui instance by @ng-technology-llc 2026-05-06 22:02:51 +00:00
skspade 7193cee152 fix: tri-state gateway status — distinguish not-configured from not-running
- Backend: return `configured` field alongside `running`. When
  alive=None (no gateway metadata), configured=false with fallback to
  identity_map heuristic.
- Frontend: amber "Gateway not configured" when configured=false,
  red "Gateway not running" only when configured but process is down,
  green "Running" when both true.
- Replace dead try/except fallback with explicit tri-state check on
  health["alive"].
- Add regression test for last_active guard when alive=true and
  identity_map is empty.

All 87 gateway-related tests pass.
2026-05-06 22:01:36 +00:00
skspade eab39f14db fix: gateway status card shows 'not running' when no platforms connected
Use agent_health.build_agent_health_payload() as the authoritative
running signal instead of bool(identity_map). An empty identity_map
means zero connected messaging platforms, not that the gateway is down.

Falls back to identity_map heuristic when agent_health module is unavailable
(e.g. WebUI-only deployments).
2026-05-06 22:01:35 +00:00
Nick d5a31a0f4d fix: isolate profile cookie per webui instance 2026-05-06 22:01:20 +00:00
ai-ag2026 a7b04bbc1e fix: preserve pending user turn on stream errors 2026-05-06 22:47:58 +02:00
Michael Lam dcc8268c92 fix: drain cron subprocess results before join 2026-05-06 18:11:14 +00:00
Michael Lam b9bf00efe1 fix: shorten cron profile lock for manual runs 2026-05-06 18:11:14 +00:00
Michael Lam 276570faec fix: route custom provider models dict selections 2026-05-06 18:11:12 +00:00