Commit Graph

938 Commits

Author SHA1 Message Date
Frank Song 4e8899592d Prefer worktree retention responses in session UI 2026-05-12 10:17:12 +08:00
Frank Song 2da4f108c5 Clarify worktree session archive/delete semantics
(cherry picked from commit f5c8fb58d1)
2026-05-12 00:05:05 +00:00
nesquena-hermes e20eb2c784 fix: skip budget-doubling title retry for reasoning-only responses (#2083)
Reasoning models (Qwen3-thinking via LM Studio, DeepSeek-R1, Kimi-K2,
etc.) can burn their entire output budget on hidden reasoning tokens and
emit no visible content. The previous title-generation retry path
classified that as llm_length and doubled the budget — but the second
call produces the same shape, so the retry only doubled the GPU/credit
burn. Repeated across the two prompts in _title_prompts() this came to
~3000 reasoning tokens of GPU work per new chat. On local LM Studio
servers behind a custom: provider (where is_lmstudio=False means
reasoning_effort: none never reaches the model) it manifested as the GPU
never going idle after a prompt.

Fix:
  - _extract_title_response: classify reasoning-bearing empty responses
    as llm_empty_reasoning regardless of finish_reason. The presence of
    reasoning_content is the diagnostic signal, not finish_reason.
  - _title_retry_status: drop llm_empty_reasoning from the retry set.
    Length-truncated responses WITHOUT reasoning still retry (those are
    legitimately recoverable by a larger budget).
  - Add _title_should_skip_remaining_attempts() and break out of the
    prompt-iteration loop on empty-reasoning. A second prompt against
    the same model would produce the same shape.
  - Falls through to _fallback_title_from_exchange for a local-summary
    title.

Tests updated to invert the previous reasoning-retry assertions:
  - test_aux_short_circuits_on_empty_reasoning_without_retrying
  - test_aux_still_retries_finish_length_without_reasoning
  - test_agent_route_short_circuits_on_empty_reasoning_without_retrying
  - test_agent_route_still_retries_finish_length_without_reasoning

Companion agent-side work (LM Studio classifier for custom: providers)
is tracked separately on the hermes-agent side; this WebUI fix is the
belt-and-braces guard so the loop stops regardless of agent classifier
state.

Reported by @darkopetrovic. Closes #2083.

Co-authored-by: darkopetrovic <darkopetrovic@users.noreply.github.com>
(cherry picked from commit efeae4a86e)
2026-05-12 00:04:11 +00:00
nesquena-hermes 02ecc5aeea fix(tests): provide LOCALES on TestVoiceModePreferenceGate
PR #2067 made TestVoiceModePreferenceGate.test_settings_pane_has_voice_mode_i18n_keys
adaptive via self.LOCALES but only defined LOCALES on the sibling class
TestComposerVoiceButtonI18n. AttributeError on CI.

Mirror the tuple to TestVoiceModePreferenceGate so the count assert resolves
to 10 with Italian present.

Co-authored-by: Samuel Gudi <samuel.gudi.official@gmail.com>
2026-05-11 23:14:20 +00:00
Samuel Gudi 23a2ad818f fix(tests): update hardcoded locale counts for Italian (it)
6 test files had hardcoded locale counts/lists that broke when
the Italian locale block was added:

- test_issue1488_composer_voice_buttons.py: added 'it' to LOCALES,
  replaced assert count == 9 with len(self.LOCALES)
- test_issue1560_password_env_var_lock.py: added 'it' to LOCALES
- test_1560_password_env_var_no_op.py: added 'it' to EXPECTED_LOCALES
- test_login_locale_parity.py: bumped floor from 9 to 10, added 'it'
- test_stage268_opus_followups.py: bumped floor from 9 to 10

(cherry picked from commit f5e42cec9b)
2026-05-11 23:13:55 +00:00
ai-ag2026 98c9a3de72 test: tighten CI and console hygiene
(cherry picked from commit bd9e6df71c)
2026-05-11 23:13:16 +00:00
Lumen Yang e37c69cf57 fix(agent-health): treat stale running gateway as unknown
(cherry picked from commit 4be346fece)
2026-05-11 23:13:09 +00:00
ai-ag2026 52fedbc783 feat: add per-cron toast notification toggle 2026-05-11 21:58:35 +02:00
nesquena-hermes 96ca83bf53 fix(security): drop unsafe-eval + add jsdelivr to CSP, sanitize plugin error
Opus stage-339 review SHOULD-FIX items:

1. server.py: drop 'unsafe-eval' from CSP report-only policy.
   Verified by grepping all production JS — zero matches for eval(),
   new Function(), or string-form setTimeout/setInterval. Keeping it
   was a gratuitous privilege.

2. server.py: add https://cdn.jsdelivr.net to script-src + style-src.
   index.html loads Prism/xterm/katex from this CDN with SRI hashes —
   without the allowance every page load fires known-good CSP violations
   that drown out real signal once a collector is wired.

3. api/commands.py: sanitize plugin command error. Previously returned
   f'Plugin command error: {exc}' which would leak paths/env from
   FileNotFoundError('/etc/something/secret.key') etc. Now returns only
   the exception type name; full traceback goes to server log.

Test asserts updated to match the new policy shape.

Co-authored-by: Opus advisor <opus-advisor@hermes.local>
2026-05-11 17:53:02 +00:00
nesquena-hermes fd069155af Merge PR #2062 into stage-339
feat: record turn journal lifecycle events
by @ai-ag2026
2026-05-11 17:43:58 +00:00
nesquena-hermes f6ce79185c Merge PR #2059 into stage-339
feat: add crash-safe turn journal writer
by @ai-ag2026
2026-05-11 17:43:58 +00:00
nesquena-hermes 2a1244f342 Merge PR #2089 into stage-339
support slash commands implemented in hermes plugin
by @plerohellec
2026-05-11 17:43:57 +00:00
nesquena-hermes 0456fb5619 Merge PR #2085 into stage-339
fix(logs): clipboard fallback + severity filter for Logs panel (#2081)
by @bergeouss
2026-05-11 17:43:56 +00:00
nesquena-hermes 9db1da76bd Merge PR #2084 into stage-339
fix: add report-only CSP header
by @ai-ag2026
2026-05-11 17:43:55 +00:00
nesquena-hermes 6a016dae6c Merge PR #2077 into stage-338
Refactor compression anchor visibility helpers
by @franksong2702
2026-05-11 17:17:25 +00:00
nesquena-hermes 98b6925333 Merge PR #2065 into stage-338
Fix session recovery polish
by @franksong2702

# Conflicts:
#	CHANGELOG.md
2026-05-11 17:17:24 +00:00
nesquena-hermes 0662f0986f Merge PR #2056 into stage-338
Fix custom provider name slugs with ports
by @franksong2702

# Conflicts:
#	CHANGELOG.md
2026-05-11 17:17:19 +00:00
nesquena-hermes 4388cb1a10 Merge PR #2068 into stage-338
fix(ui): prevent stuck sidebar spinner on completed sessions (closes #2066)
by @franksong2702
2026-05-11 17:17:05 +00:00
nesquena-hermes 2bfd538714 Merge PR #2063 into stage-338
fix: keep explicit forks out of lineage report
by @dso2ng
2026-05-11 17:17:05 +00:00
nesquena-hermes ee6c67f30c Merge PR #2074 into stage-338
Fix HERMES_HOME skill cache patching
by @franksong2702
2026-05-11 17:17:04 +00:00
nesquena-hermes da6b897e54 Merge PR #2076 into stage-338
test: add kanban locale parity check (refs #1973)
by @bergeouss
2026-05-11 17:17:03 +00:00
Philippe Le Rohellec 281a57b60a support slash commands implemented in hermes plugin 2026-05-11 09:42:40 -07:00
bergeouss 85547612fe fix(logs): clipboard fallback + severity filter for Logs panel (#2081)
- replace navigator.clipboard.writeText with _copyText (has textarea fallback)
- add severity filter dropdown (All / Errors / Warnings+)
- add _severityForLine and _filteredLogsLines helpers
- add logsSeverityFilter HTML element + CSS class hooks
- add 5 new i18n keys across all 8 locales
- update test_logs_ui_static.py to match new implementation

Closes #2081
2026-05-11 15:40:49 +00:00
ai-ag2026 80c12123d2 Merge branch 'master' into fix/csp-report-only 2026-05-11 17:26:45 +02:00
ai-ag2026 c3fea4db3e fix: add report-only CSP header 2026-05-11 17:26:20 +02:00
ai-ag2026 c864ad47af fix: address turn journal lifecycle review 2026-05-11 17:16:43 +02:00
ai-ag2026 d04d48f5a0 fix: harden turn journal submitted writes 2026-05-11 17:13:57 +02:00
ai-ag2026 c4b7a65356 test: keep local context docs ignored 2026-05-11 17:09:19 +02:00
Frank Song 6a52edf2ab Fix stale inflight purge runtime lookup 2026-05-11 21:53:43 +08:00
Frank Song 18124ced62 Refactor compression anchor visibility helpers 2026-05-11 20:56:30 +08:00
bergeouss c0ccefd322 test: add kanban locale parity check (refs #1973)
Add test_kanban_locale_parity to test_kanban_ui_static.py that asserts
every kanban_* i18n key in the English locale exists in all non-English
locale blocks. Pattern follows test_lineage_segment_locale_keys_are_defined_for_sidebar_locales.
2026-05-11 12:38:48 +00:00
Frank Song c8d110a7f0 test: align sidebar spinner state assertions 2026-05-11 20:31:00 +08:00
Frank Song a0e9c06102 Fix HERMES_HOME skill cache patching 2026-05-11 19:12:02 +08:00
ai-ag2026 d30263bcf1 test: allow top-level markdown docs 2026-05-11 12:36:35 +02:00
Frank Song f6115b78c6 Fix custom provider name slugs with ports 2026-05-11 17:24:53 +08:00
Dennis Soong 5efd287264 fix: align fork lineage projection paths 2026-05-11 17:15:22 +08:00
Frank Song 2cd10868aa Fix session recovery polish 2026-05-11 16:30:25 +08:00
Dennis Soong 1e8d65ea01 fix: keep explicit forks out of lineage report 2026-05-11 15:23:52 +08:00
ai-ag2026 4b486f2860 feat: record turn journal lifecycle events 2026-05-11 09:13:25 +02:00
ai-ag2026 5cd001d545 feat: add crash-safe turn journal writer 2026-05-11 08:49:53 +02:00
nesquena-hermes cd7107cefb test(infra): identity check by qname (CI re-imports conftest under multiple roots)
CI's pytest invocation imports conftest twice (once via the standard
tests/ discovery, once via repo-root rootdir discovery), producing two
distinct function objects with the same __qualname__ but different `is`
identity. The strict identity assertion failed because each import
created a fresh closure. Switch to __qualname__ substring check — same
guarantee (default-on state has the wrapper installed; fixture restores
the real one) without the multi-import sensitivity.
2026-05-11 06:18:13 +00:00
nesquena-hermes d9bc8360a4 test(infra): fixture swaps real functions via monkeypatch (CI-robust)
CI on Python 3.11 still failed test_allow_outbound_network_fixture_*
because the previous module-global toggle (_ALLOW_OUTBOUND=True/False)
was unreliable on the runner — the wrapper's global lookup at call time
sometimes saw False even after the fixture's True assignment.

Switch to monkeypatch-based fixture: instead of toggling a global that
the wrapper checks, restore socket.create_connection and
socket.socket.connect to their REAL captured implementations for the
duration of the test. Pytest's monkeypatch fixture handles teardown so
the wrappers are reinstalled automatically.

Rewrote the two paired tests to check function identity
(socket.create_connection is _hermes_blocked_create_connection vs. is
_REAL_CREATE_CONNECTION) instead of attempting a live outbound to
8.8.8.8:53 — direct identity check is hermetic and doesn't depend on
whether the CI runner has any outbound network access at all.
2026-05-11 06:15:46 +00:00
nesquena-hermes 6d83d16016 test(infra): tighten IPv6 unique-local check + replace self-passing fixture test
Two low-severity follow-ups from Opus regrounding review:

1. The IPv6 unique-local fc00::/7 check was `h.startswith('fc') or
   h.startswith('fd')` — too loose. It would also classify hostnames
   like 'food.example.com' or 'fdsa.test' as 'local' and silently let
   them through the block. Tightened to a regex match for canonical
   IPv6 syntax (`f[cd][0-9a-f]{0,2}:`) so only actual IPv6 addresses
   match. Same fix in both tests/conftest.py and server.py.

2. test_allow_outbound_network_fixture_unblocks was technically
   self-passing: it tried to connect to a *.invalid hostname, which is
   in the allow-list, so the real socket.create_connection would run
   regardless of whether the fixture toggled the block. Replaced with
   a public-IP-based test that actually proves the toggle works, plus
   a paired test_block_is_active_outside_the_fixture sanity test that
   proves the block is on without the fixture.

Both follow-ups noted by Opus advisor as 'defer-OK' but trivial fixes
so landing them in this batch.
2026-05-11 06:12:07 +00:00
nesquena-hermes 23cfc99738 fix(config): split hermes_cli and urlopen fallback in lmstudio branch (CI fix)
CI on Python 3.13 (clean editable install, no hermes_cli package) was still
failing the 3 lmstudio tests after the first fix attempt. Root cause: the
outer try/except in the lmstudio branch was catching ImportError from
`from hermes_cli.models import provider_model_ids`, hijacking the whole
branch and silently skipping the urlopen fallback.

Restructured into two independent tiers:
  1. hermes_cli lookup in its own try/except — ImportError logs at DEBUG
     and continues with lm_ids=[].
  2. urlopen fallback runs unconditionally when lm_ids is empty, including
     after hermes_cli import failure.

New regression test `test_lmstudio_fallback_works_when_hermes_cli_unavailable`
explicitly blocks hermes_cli via sys.meta_path and verifies the lmstudio
group still populates from the urlopen fallback. Without this test, the
CI-vs-local divergence (local env had hermes_cli installed, CI didn't)
would keep slipping through.

All 12 lmstudio-related tests pass, including the 3 #1527 tests that
broke on stage-337.
2026-05-11 06:06:58 +00:00
nesquena-hermes 12cef733e3 fix(recovery): preserve worktree metadata + workspace + message_count on state.db sidecar rebuild
PR #2053 added worktree-backed session creation. PR #2041 (shipped in
v0.51.42) added state.db sidecar reconciliation that rebuilds a missing
<sid>.json sidecar from the canonical state.db row when the JSON file is
gone (failed save, manual rm, restore-from-backup with mismatched dirs).

The two interact silently. `_state_db_row_to_sidecar()` was hard-coding
`'workspace': ''` and never propagating the four worktree_* fields from
the row to the rebuilt sidecar dict. So a worktree-backed session that
loses its sidecar and gets rebuilt from state.db:

- loses `worktree_path` → matches the empty-session sidebar filter at
  `api/models.py:1067/1107` (which spares worktree-backed empty sessions
  via `not s.get('worktree_path')`) → session disappears from the
  sidebar even though the worktree directory still exists on disk.

- loses `workspace` → downstream tools (terminal panels, file pickers
  that use `s.workspace`) operate on empty string instead of the original
  worktree path.

- always reports `message_count == 0` → contributes to the empty-session
  filter even for sessions that have messages in `state.db.messages`.

Fix:

1. `_read_state_db_missing_sidecar_rows()` SELECT now includes
   `workspace, worktree_path, worktree_branch, worktree_repo_root,
   worktree_created_at, message_count` (each gated by
   `_sql_optional_col()` so older state.db schemas without those columns
   continue to work — recovery degrades gracefully rather than 500ing).

2. `_state_db_row_to_sidecar()` propagates each field. workspace comes
   from the row if it's a string, otherwise '' (matching pre-fix behavior
   for non-worktree sessions). message_count comes from the row if
   it's an int, otherwise falls back to `len(messages)` so the rebuilt
   sidecar always has a coherent count.

3 new regression tests in tests/test_state_db_worktree_recovery.py
exercise:
- worktree session with messages → all four worktree_* fields preserved.
- non-worktree session → worktree_* fields all None (no spurious
  propagation), workspace=''.
- empty worktree session (the worst case) → confirms the rebuilt sidecar
  does NOT match the empty-session-exempt filter, so it stays visible
  in the sidebar.

Caught by Opus advisor during stage-337 review (the cross-PR interaction
between #2053 and the previously-shipped #2041 wasn't exercised by either
PR's individual test suite).
2026-05-11 06:00:13 +00:00
nesquena-hermes 2ca220eec0 fix(config): PR #1970 lmstudio branch must honor cfg.model.base_url fallback
PR #1970 added a dedicated `elif pid == "lmstudio":` branch in
`get_available_models()` that fetches the live /v1/models list when the
hermes_cli helper doesn't have ids cached. The fallback path inside that
branch only looked at `cfg["providers"]["lmstudio"]["base_url"]`, missing
the historical config shape where the URL lives under `cfg["model"]`:

  model:
    provider: lmstudio
    base_url: http://192.168.1.22:1234/v1   ← here, not under providers.lmstudio
  providers:
    lmstudio:
      api_key: local-key

3 pre-existing tests in tests/test_issue1527_lmstudio_base_url_classification
broke on stage-337 because of this — they passed on master, failed after
the PR #1970 merge.

The simpler fix is to enhance the already-introduced `_get_provider_base_url()`
helper so it falls back to `cfg["model"]["base_url"]` when
`cfg["model"]["provider"] == provider_id`, then use the helper inside the
lmstudio branch instead of a direct lookup. This keeps the previous
behaviour (where the generic configured-provider branch handled lmstudio
via the model block) while preserving PR #1970's live-discovery additions.

Belt-and-suspenders: `_get_provider_base_url()` explicitly does NOT inherit
model.base_url for providers other than the active one — if a user's config
says `model.provider: anthropic` and they have `providers.openai` configured
without a base_url, openai must still resolve to None (use SDK default),
not to the anthropic proxy URL.

6 new regression tests in tests/test_pr1970_lmstudio_base_url_fallback.py
lock the two-location lookup, the precedence rule (explicit providers entry
wins over model fallback), trailing-slash stripping, and the negative case
(model.base_url MUST NOT leak to non-active providers).

All 51 tests in the existing model-resolver + custom-provider banks still
pass.

Caught by maintainer review on stage-337 (full pytest with the new network
isolation in place surfaced the regression that the fork-CI mock-server path
would have hidden).
2026-05-11 05:59:59 +00:00
nesquena-hermes a6174d08db test(infra): hermetic network isolation — block all outbound from tests
Tests should not reach the public internet. Before this commit, an
accidentally-leaking outbound socket from the test_server fixture (real
TLS handshakes to Anthropic / Amazon / OpenRouter, sometimes triggered
by SDK-init paths that found a credential the credential-strip allowlist
missed) was adding 60+s of wall-time to a 100s test run and creating a
class of flaky failures.

This installs a default-deny socket-block at two layers:

1. Pytest process, via tests/conftest.py module-level monkey-patch on
   socket.create_connection + socket.socket.connect. Loopback / RFC1918
   private / link-local / RFC2606 reserved-TLD destinations pass through;
   anything else raises OSError("hermes test network isolation: outbound
   to ... blocked"). Tests that legitimately need real outbound opt back
   in via the new `allow_outbound_network` fixture (no current callers).

2. Test_server subprocess (server.py), via a HERMES_WEBUI_TEST_NETWORK_BLOCK=1
   environment-variable-gated guard at the top of server.py. tests/conftest.py
   sets the env var on every test_server spawn. Without this, the subprocess
   could make outbound that the pytest-side block can't see (which is exactly
   what was happening — verified via `ss -tnp` showing the server.py child
   with established ESTAB sockets to [2607:6bc0::10]:443).

In production the env var is unset, so the guard is a no-op.

Companion changes:

- test_dns_resolution_failure refactored to mock socket.getaddrinfo
  raising gaierror, instead of relying on a real DNS lookup of a
  *.invalid hostname. The test was the one outlier that genuinely
  exercised real DNS; mocking matches what every other probe-error test
  in the same file already does.

- New tests/test_conftest_network_isolation.py with 9 adversarial
  tests proving the block fires for public IPs (including the exact
  Anthropic IPv6 and Amazon IPv4 destinations we observed leaking),
  the allow-list passes loopback / RFC1918 / link-local / reserved-TLDs,
  and the opt-in fixture re-enables real outbound when needed.

Test suite: 5,120 → 5,192 (+72 net new from this commit + the regression
tests in the companion commits). Wall time: 161s → 95s on the same
hardware. No remaining outbound from any test path.
2026-05-11 05:59:42 +00:00
nesquena-hermes d86dcc12c6 Merge PR #2055: fix: duplicate assistant transcript merge 2026-05-11 05:12:05 +00:00
nesquena-hermes 44e7378be8 Merge PR #2053: feat: worktree-backed session creation
# Conflicts:
#	CHANGELOG.md
2026-05-11 05:12:00 +00:00
nesquena-hermes e3001d16fc Merge PR #2048: [security] validate workspace on import 2026-05-11 05:11:21 +00:00