Commit Graph

495 Commits

Author SHA1 Message Date
Manfred eeb5dc545d feat: add read-only Kanban API bridge 2026-05-04 22:56:42 +00:00
Hermes Agent 3005bfc491 chore(release): stamp v0.50.297 — 3-PR batch + Opus pass + 2 follow-ups absorbed
Constituent PRs:
  #1659 by @bergeouss — Docker readonly false-positive (closes #1658, fixes v0.50.295 regression)
  #1653 by @nesquena — OAuth cancel race fix (follow-up to v0.50.296 #1652)
  #1657 by @Michaelyklam — health diagnostics + watchdog hardening (refs #1458 Bug #3)

Opus advisor SHIP verdict on stage-297. Two follow-ups absorbed in-release:
- _deep_health_checks(stream_check=...) reuses pre-computed lock probe
- _handle_request_noblock docstring documents single-thread safety

PR #1656 closed as superseded by #1657 (same author, both target #1458,
#1657 is functional superset).

4284 → 4288 tests passing (+4).
2026-05-04 22:50:57 +00:00
test c3d6a2d6ee Stage 297: PR #1657 — Health diagnostics + persistent-host hardening (refs #1458) by @Michaelyklam 2026-05-04 22:40:53 +00:00
Michael Lam ca135c2015 fix: harden persistent WebUI health checks 2026-05-04 15:30:37 -07:00
Nathan Esquenazi b34ce63c97 fix(oauth): honor cancel during Codex device-token exchange (follow-up to #1652)
The Codex OAuth onboarding worker introduced in #1652 had a cancel-vs-worker
race: a `cancel_onboarding_oauth_flow` request that arrived while the worker
was mid-network-call (between the `live = dict(...)` snapshot and the next
status check) would be silently overridden:

  1. User clicks Cancel → server sets flow.status = "cancelled" and drops
     sensitive lifecycle fields under the lock.
  2. Worker is mid-`_poll_codex_authorization` / `_exchange_codex_authorization`
     using the local `live` snapshot it captured before the cancel.
  3. Worker calls `_persist_codex_credentials(...)` — auth.json gets written.
  4. Worker calls `_set_flow_status(flow_id, "success")` — overrides the
     cancelled status.

Net effect: the user's explicit cancel is ignored, credentials are persisted,
and the UI reports success. Reproduced with a behavioural harness that drove
a real worker thread against patched network helpers and confirmed:

  pre-fix : flow status `success`, auth.json written despite cancel
  post-fix: flow status `cancelled`, auth.json NOT written

The fix re-checks the flow status under `_OAUTH_FLOWS_LOCK` after the token
exchange completes and before persisting. If the status is no longer
`pending`, the worker exits without persisting credentials and without
overwriting the terminal status.

Regression test `test_cancel_during_token_exchange_does_not_persist_credentials`
drives the worker against threading.Event-gated network stubs to reproduce
the race deterministically and lock the new invariant.

Trace verified against fresh hermes-agent tarball — credential_pool entry
shape (`auth_type=oauth`, `source=manual:device_code`, `priority=0`, base_url)
remains compatible with `agent.credential_pool.load_pool("openai-codex")` and
the agent CLI's `_save_codex_tokens` legacy fallback path.

Tests:
- 10/10 in tests/test_issue1362_codex_oauth_onboarding.py
- Full suite: 4230 passed, 57 skipped, 3 xpassed, 0 failed in 33.82s

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 14:49:38 -07:00
Hermes Agent db54dc594e chore(release): stamp v0.50.296 — 3-PR batch + Opus pass + 2 follow-ups absorbed
Constituent PRs (all by @Michaelyklam):
  #1640 — show TPS in assistant message headers (closes #1617) — Aaron UX APPROVED
  #1648 — session save mode config (closes #1406)
  #1650 — Codex OAuth onboarding flow (refs #1362)

Opus advisor SHIP verdict on stage-296. 14-question audit passed including
focused OAuth security review on #1650. Two minor follow-ups absorbed:
- _get_active_hermes_home() exception fallback now logs warning
- Codex credential pool find-loop accepts both legacy and current source values

#1640 has @aronprins UX gate APPROVED (default-off TPS toggle in Preferences).
#1650 ships first in-app OAuth flow — server-owned device-code lifecycle,
profile-scoped credential storage, atomic chmod-before-rename writes.

4255 → 4284 tests passing (+29).
2026-05-04 21:38:26 +00:00
test c07d821586 Stage 296: PR #1650 — Codex OAuth onboarding flow (refs #1362) by @Michaelyklam 2026-05-04 21:26:52 +00:00
test 34b060d993 Stage 296: PR #1648 — session save mode config (closes #1406) by @Michaelyklam 2026-05-04 21:26:52 +00:00
Michael Lam 89099928db fix: make TPS header display optional 2026-05-04 21:26:43 +00:00
Michael Lam 3ad8846a27 fix: show TPS in assistant message headers 2026-05-04 21:26:43 +00:00
Michael Lam 259c5c4afb feat: add Codex OAuth onboarding flow 2026-05-04 14:07:16 -07:00
Michael Lam 876a670387 feat: add session save mode config 2026-05-04 14:05:49 -07:00
Hermes Agent 9aad249e5a chore(release): stamp v0.50.295 — 3-PR batch + Opus pass
Constituent PRs:
  #1637 by @Michaelyklam — protect raw pre from glued-bold lift (closes #1451)
  #1639 by @bergeouss — macOS auto-scroll race + custom:* provider list (closes #1360, #1619)
  #1642 by @nesquena-hermes — YAML/JSON/diff code block newlines (closes #1618, #1463)

Opus advisor SHIP verdict on stage-295. One observation absorbed:
- api/config.py:2533 dead-code comment per Opus (defensive belt-and-braces
  for #1619 fallback; load-bearing fix is in routes.py /api/models/live)

PR #1641 (Michaelyklam parallel-discovery duplicate of #1642) closed as
superseded; UI media adopted with co-author trailer.

4245 → 4255 tests passing (+10).
2026-05-04 18:37:52 +00:00
bergeouss 324aeaaded fix: macOS auto-scroll momentum race (#1360) + custom:* provider model list (#1619)
#1360 — On macOS WKWebView, trackpad momentum scrolling fires scroll
events that interleave with the _programmaticScroll setTimeout(0) guard.
A mid-momentum scroll event either gets swallowed (_programmaticScroll
still true) or falsely reports nearBottom (momentum hasn't settled),
keeping _scrollPinned=true and snapping the viewport back down.

Fix: rAF-debounce the scroll listener so the nearBottom check runs at
the next paint frame when the browser's scroll position has settled.
Added a hysteresis counter requiring 2 consecutive near-bottom samples
before re-pinning, preventing accidental re-pin during deceleration.

#1619 — When a custom:* provider (e.g. custom:relay via custom_providers)
has models that overlap with auto-detected models from base_url /v1/models,
the dedup logic at config.py:2263 skipped them all. The named custom
group ended up empty, and the continue at line 2334 silently discarded
the auto-detected models. Result: only the default model appeared.

Fix 1 (config.py): When custom:* named group has 0 models after dedup,
fall back to auto_detected_models_by_provider instead of dropping them.

Fix 2 (routes.py): Extended /api/models/live fallback to handle
custom:* slugs (not just bare "custom") for both custom_providers
config lookup and base_url live fetch.
2026-05-04 18:23:04 +00:00
test 6bbf913e22 Stage 294: PR #1631 — streaming stability trio (closes #1623, #1624, #1625) by @nesquena-hermes — APPROVED 2026-05-04 17:13:08 +00:00
nesquena-hermes 66b925f59d fix(cache): stamp /api/models disk cache with WebUI version + schema version (#1633)
Closes #1633. STATE_DIR/models_cache.json was persisted across server
restarts without any version stamp, so a Docker container update from
version A to B read the cache file written by version A — users saw
stale picker contents (missing models, phantom provider groups) for
up to 24 hours until either the TTL expired, an unrelated provider
edit triggered invalidate_models_cache(), or they manually deleted
the file.

Reporter Deor (Discord) updated to v0.50.292 — which contained fixes
for #1538, #1539, and #1568 — did a hard refresh and cleared site
data, and still saw byte-for-byte identical picker contents because
the server kept reading the v0.50.281 cache file off the host-mounted
state volume.

Fix:
  * _save_models_cache_to_disk() stamps payloads with _webui_version
    (resolved lazily from api.updates.WEBUI_VERSION via sys.modules
    lookup to avoid the api.config <-> api.updates circular import)
    and _schema_version = 2.
  * New _is_loadable_disk_cache() validator checks both stamps in
    addition to shape. Mismatch on either field rejects the load.
  * _load_models_cache_from_disk() calls the new validator and
    strips the disk-only metadata before returning, so the rest of
    the code sees the same shape it always did.
  * _is_valid_models_cache() kept loose (shape-only) so in-memory
    cache writes that never touch disk don't fail validation.

Schema version is independent of the WebUI version stamp so future
cache-shape changes can invalidate older releases without relying
on a tag bump alone.

Early-init edge case (api.updates not yet loaded) skips the version
check rather than wedging the boot — at worst an unstamped file is
written once and rejected on the next call.

Updated existing tests/test_model_cache_metadata.py to use subset/
round-trip semantics rather than byte-for-byte equality, since the
disk payload now has additional stamps. The four response-shape
fields still round-trip verbatim; the load result is unchanged
(stamps stripped). 19 new regression tests.

4180 -> 4199 tests pass.
2026-05-04 17:03:02 +00:00
nesquena-hermes 040cb8af70 Apply Opus pre-release SHOULD-FIX + NITs (in-PR per release policy)
SHOULD-FIX: rate-limit _repair_stale_pending repair-firing telemetry. Switch
from unconditional logger.warning to age-keyed: WARNING when pending_age <
5min (the diagnostically valuable race window — actual leak-path candidates
that slipped past the grace guard) and DEBUG for the long-tail (orphaned
sidecars from prior process lifetimes). Prevents reconnect loops on stuck
sessions from flooding the log while preserving the diagnostic signal we
want for tuning _REPAIR_STALE_PENDING_GRACE_SECONDS empirically.

NIT: _LOCAL_SERVER_PROVIDERS expanded with lm-studio (hyphenated alias used
in some custom_providers configs and already recognized at api/config.py:2189
for SSRF host trust) and localai (LocalAI project). Test parametrize expanded
from 7 to 11 names, also covering pre-existing koboldcpp and textgen for
symmetry. +4 regression tests.

NIT (docs): CHANGELOG callout for the RFC1918 behavior change. Internal-
network OpenAI-compatible proxies now preserve the model prefix on private-IP
base_urls. Documented the migration path: configure as a custom_providers
entry to bypass the local-server detection.

NIT (deferred, optional): narrowing the heuristic to is_loopback only is
left as future work; the broader scope was an explicit goal in the bug
body and Opus flagged it as SHOULD-DISCUSS-but-not-block.

4184 -> 4188 passing. 0 regressions. ~10 LOC absorbed total.
2026-05-04 16:50:22 +00:00
nesquena-hermes bea57beba9 fix(streaming): SSE heartbeat alignment, repair grace period, local-server model id preservation (#1623, #1624, #1625)
Closes #1623 — Lower SSE app heartbeat from 30s to 5s at every long-lived
handler (main agent, terminal, gateway-watcher, approval-poller, clarify-poller).
Kernel TCP keepalive declares peer dead at 25s worst-case (10s KEEPIDLE +
5s KEEPINTVL * 3 KEEPCNT, added v0.50.289 #1581). 30s app heartbeat let the
kernel tear sockets down on flaky networks before the app sent its first
keepalive byte — drops at ~10s during long thinking phases. New named
constant _SSE_HEARTBEAT_INTERVAL_SECONDS=5; regression test pins the
inequality (app_heartbeat * 2 <= kernel_window) so future tuning can't
re-introduce the misalignment.

Closes #1624 — Add 30s grace period to _repair_stale_pending() trigger.
Without it, any narrow race between the streaming thread clearing
pending_user_message and STREAMS.pop(stream_id) produces a false-positive
'Previous turn did not complete.' marker on a turn that finished correctly
(reproducible after every command-approval turn). Defense-in-depth, not
the root-cause fix — the actual streaming-thread leak path is tracked
separately. Falsy pending_started_at (legacy sidecars) treated as
'old enough' so legitimate legacy-data recovery still works. Plus
logger.warning telemetry on every legitimate repair so the next batch of
user reports tells us whether the underlying race still fires.

Closes #1625 — Local model servers (LM Studio, Ollama, llama.cpp, vLLM,
TabbyAPI, koboldcpp, textgen-webui) now keep the full HuggingFace-style
model id (e.g. 'qwen/qwen3.6-27b' instead of stripped 'qwen3.6-27b'). New
_LOCAL_SERVER_PROVIDERS set + _base_url_points_at_local_server() loopback/
RFC1918 heuristic — either signal triggers no-strip. Backward compat
preserved for OpenAI-compatible proxies on public hosts (LiteLLM at
litellm.example.com still strips openai/gpt-5.4 -> gpt-5.4). Updated the
existing #230/#433 test to reflect that #1625 supersedes the strip-on-custom
rule for loopback hosts (see api/config.py and test_model_resolver.py
docstring update). Reported by @akarichan8231 in Discord on 2026-05-04.

42 regression tests across:
  tests/test_issue1623_sse_heartbeat_alignment.py (3)
  tests/test_issue1624_repair_stale_pending_grace.py (9)
  tests/test_issue1625_local_server_model_id_preservation.py (30)

4142 -> 4184 passing. 0 regressions.
2026-05-04 16:49:43 +00:00
Hermes Agent f3e066b53c chore(release): stamp v0.50.293 — 3-PR batch + 2 Opus follow-ups absorbed
Constituent PRs:
  #1627 by @franksong2702 — show Hermes Agent version (closes #1606)
  #1629 by @nesquena-hermes — profile isolation trio (closes #1611, #1612, #1614)
  #1630 by @Michaelyklam — provider config cleanup regression test (#1597 follow-up)

Opus advisor SHIP verdict + 2 SHOULD-FIX absorbed in-release:
- load_projects() re-reads from disk inside lock to close migration startup race
- _detect_agent_version() uses --dirty for symmetry with _detect_webui_version()

4142 → 4180 tests passing.
2026-05-04 16:33:57 +00:00
test 838645fd50 Stage 293: PR #1629 — profile isolation trio (closes #1611, #1612, #1614) by @nesquena-hermes — APPROVED 2026-05-04 16:21:29 +00:00
test 341b4c7abd Stage 293: PR #1627 — show Hermes Agent version in Settings (closes #1606) by @franksong2702 2026-05-04 16:20:39 +00:00
nesquena-hermes 6bc0f9c4d5 Apply Opus pre-release SHOULD-FIX + NITs (in-PR per release policy)
SHOULD-FIX #1 (renamed-root client cross-alias): drop strict-equality client
filter at static/sessions.js:1853. Server-side _profiles_match cross-aliases
'default'-tagged rows to a renamed root 'kinni'; the strict-equality client
would reject them, dropping every legacy session for renamed-root users. The
server is now solely authoritative for profile scoping.

SHOULD-FIX #2 (messaging-source dedupe ordering): _keep_latest_messaging_session_per_source
now runs AFTER the profile filter at api/routes.py:2078. Before, it ran on
the merged-cross-profile list with profile-blind keys, discarding the older
profile's row across profiles before the scope filter — leaving zero rows for
any messaging identity the active profile shared with another profile.

NIT #3: _projects_migrated flag now set only AFTER successful save_projects.
NIT #4: cleaned dead test code in test_is_root_profile_invalidation_drops_stale.
NIT #5: _create_profile_fallback's clone_from=='default' literal now routes
through _is_root_profile() for parity with the 5 other callsites.

+2 regression tests pin the SHOULD-FIX shapes:
- test_keep_latest_messaging_runs_after_profile_filter (source-string ordering)
- test_static_sessions_js_trusts_server_profile_scoping (no client re-filter)

4173 -> 4175 tests pass. 0 regressions.
2026-05-04 16:17:26 +00:00
Michael Lam b6c695e1ab test: cover provider config cleanup path 2026-05-04 09:04:07 -07:00
nesquena-hermes e8862632ed fix(profiles): scope sessions, projects, and root-profile resolution to active profile (#1611, #1612, #1614)
Closes #1611 — /api/sessions filters by active profile by default; ?all_profiles=1
opt-in for aggregate views; new _profiles_match() helper honours renamed-root
cross-aliasing; static/sessions.js drops the s.is_cli_session bypass; toggle-on
re-fetches with all_profiles=1 instead of slicing client-cached rows.

Closes #1612 — new _is_root_profile() central helper consults list_profiles_api()
for is_default=True matches alongside the legacy 'default' alias. Replaces five
literal-default callsites in api/profiles.py. Memoized with explicit invalidation
hooks at create + delete. Sticky active_profile file write now stores '' for
renamed root, consistent with the legacy empty==root contract.

Closes #1614 — projects carry a profile field stamped at create-time;
/api/projects filters by active profile; /api/projects/{create,rename,delete}
and /api/session/move reject ops on cross-profile projects with 404; new
_PROJECTS_MIGRATION migration in load_projects() back-tags untagged projects
from any session that uses them, fall back to 'default'; ensure_cron_project
keys lookup by (name, profile) so each profile gets its own Cron Jobs project.

31 regression tests (9+11+11) pin the renamed-root resolution, server-side
profile scoping shape, helper invariants, cross-alias matching, migration
behavior, and active-profile guards on every project mutation endpoint.
4148 tests pass.

Reporter: @stefanpieter

Co-authored-by: stefanpieter <noreply@github.com>
2026-05-04 16:03:05 +00:00
Frank Song 59efb42dcd Show Hermes Agent version in settings 2026-05-04 23:57:56 +08:00
Hermes Agent 1549a10510 chore(release): stamp v0.50.292 — 12-PR batch + Opus follow-ups absorbed
Constituent PRs:
  #1597 by @Michaelyklam — pytest config-path isolation
  #1598 by @Michaelyklam — multi-tab SSE broadcast (closes #1584)
  #1599 by @Sanjays2402 — _pending_started_at truthy-check (closes #1595)
  #1600 by @Michaelyklam — streaming markdown subpath/fallback
  #1601 by @Michaelyklam — subpath frontend routes
  #1602 by @ai-ag2026 — cross-source continuation
  #1603 by @ai-ag2026 — git remote name preservation
  #1605 by @ai-ag2026 — update banner branch labels
  #1608 by @franksong2702 — cron broad-except removal (closes #1578)
  #1609 by @franksong2702 — server.py socket cleanup (closes #1583)
  #1621 by @franksong2702 — fork indicator polish (fixes #1613)
  #1622 by @s905060 — paste text-with-image (closes #1620)

Opus advisor SHIP verdict + 2 SHOULD-FIX absorbed in-release:
  • #1598 ordering race fixed (offline-buffer replay moved inside lock)
  • #1601 sessions.js:1440 gateway SSE probe baseURI parity fix

4117 → 4142 tests passing.
2026-05-04 15:45:41 +00:00
test 21eb8a89bf Stage 292: PR #1598 — broadcast SSE stream events to multiple tabs (closes #1584) by @Michaelyklam 2026-05-04 15:34:17 +00:00
test 8a10532d29 Stage 292: PR #1601 — keep frontend routes under subpath mounts by @Michaelyklam 2026-05-04 15:34:08 +00:00
test b6702fbeae Stage 292: PR #1602 — keep cross-source continuations separate in sidebar by @ai-ag2026 2026-05-04 15:33:32 +00:00
test 51848fb67d Stage 292: PR #1603 — preserve git remote names in update links by @ai-ag2026 2026-05-04 15:33:32 +00:00
test 165356e744 Stage 292: PR #1608 — tighten worker-side broad-except in _run_cron_tracked (closes #1578) by @franksong2702 2026-05-04 15:33:32 +00:00
test 5b4ab72452 Stage 292: PR #1597 — isolate pytest Hermes config path by @Michaelyklam 2026-05-04 15:33:32 +00:00
Frank Song cdcd6021cc fix(cron): tighten worker-side broad-except in _run_cron_tracked (closes #1578)
Remove the try/except Exception wrapper around
cron_profile_context_for_home(...).__enter__() in _run_cron_tracked.
A silent fallback to ctx=None would leave the worker thread unpinned
against process-global HERMES_HOME, silently corrupting cross-profile
state — the same class of bug as #1573.

Add regression test to catch any future re-introduction.
2026-05-04 16:28:33 +08:00
Manfred 3c93d5a702 fix: keep cross-source continuations separate in sidebar 2026-05-04 09:30:47 +02:00
Manfred 93251e5bcb fix: preserve git remote names in update links 2026-05-04 09:30:47 +02:00
Michael Lam e9d7d5e427 fix: keep frontend routes under subpath mounts 2026-05-04 00:06:58 -07:00
Sanjay Santhanam 14fac05dc9 fix(streaming): use truthy-check for _pending_started_at fallback
Switch the per-turn duration fallback from `is not None` to a truthy check so
None, missing-attr, and an explicit 0 all uniformly fall back to time.time().

Without this, a 0 timestamp (e.g. via a buggy migration or manual file edit)
would yield `time.time() - 0` ≈ wall-clock-since-epoch, displaying nonsense
like 'Done in 56 years 4 months ...'. In practice pending_started_at is always
set via int(time.time()) so this is a hardening fix, not a live-bug fix.

Also drop the brittle source-string assertion in the regression test that
pinned the literal expression. The behavioural test
test_done_handler_persists_duration_on_last_assistant_message already proves
the duration field is set; pinning the source line broke twice during the
v0.50.290 release pipeline alone (Opus tightening + maintainer revert).

Fixes #1595

Signed-off-by: Sanjay Santhanam <51058514+Sanjays2402@users.noreply.github.com>
2026-05-03 23:21:19 -07:00
Michael Lam 22187d2b4c fix: resolve provider config cleanup path 2026-05-03 23:13:10 -07:00
Michael Lam 6c5bc95b3b fix: broadcast SSE events to all tabs 2026-05-03 22:43:11 -07:00
nesquena-hermes 3369a08f37 fix(updates): use merge-base for compare URL so 'What's new?' link resolves
Closes #1579.

api/updates.py was building the GitHub compare URL from local HEAD short SHA:

    repoUrl + '/compare/' + curSha + '...' + newSha
    where curSha = `git rev-parse --short HEAD`

Whenever local HEAD diverges from upstream — unpushed work, dirty stage
branches, forks, in-flight rebases, release-time merge commits whose SHA
only lives in the maintainer's local history — the compare URL points at
a SHA github.com has never seen and returns the standard 404 page.

Reporter (@ai-ag2026) observed:
  https://github.com/nesquena/hermes-webui/compare/c660c7f...86cb22e
  → 404 because c660c7f was an unpushed local commit.

The right base is `git merge-base HEAD <compare_ref>` — the most recent
commit local and upstream share. Since `git fetch` succeeded just before,
the merge-base is guaranteed to exist on the upstream GitHub repo.

Behavior matrix:
  Pure-behind clone (no local commits): merge-base == HEAD; URL unchanged.
  Behind + local-only commits (#1579):  merge-base != HEAD; URL points at
                                        public ancestor instead of local HEAD.
  merge-base failure (shallow clone):   current_sha=None; JS link guard
                                        suppresses link rather than emitting
                                        a known-broken URL.

Also hardens static/ui.js: reset the link's href and display:none on every
banner render, so a stale link from a prior render can't survive a re-render
where the new payload has current_sha=null.

Tests:
  - test_current_sha_is_merge_base_not_local_HEAD — reporter's scenario
  - test_current_sha_equals_HEAD_when_no_local_commits — backward compat
  - test_current_sha_falls_back_to_None_when_merge_base_fails — defensive
  - test_whats_new_link_resets_display_and_href_on_every_render
  - test_whats_new_link_suppressed_when_curSha_falsy
  - test_reporter_url_shape_no_longer_produces_invalid_compare_url

4094 → 4100 passing. 0 regressions.
2026-05-04 05:26:19 +00:00
Hermes Bot d15b0a2929 Stage 290: PR #1592 — turn duration display 'Done in 1m 12s' by @Michaelyklam 2026-05-04 04:51:43 +00:00
Hermes Bot 38a9878821 Stage 290: PR #1591 — first-turn sidebar visibility (optimistic upsert) by @Michaelyklam 2026-05-04 04:51:43 +00:00
Michael Lam 0eddb0580e fix: document turn duration fallback 2026-05-03 21:12:07 -07:00
Michael Lam f3fa106cd7 feat: show agent turn duration 2026-05-03 20:20:17 -07:00
Michael Lam 9ed0639319 fix: show first-turn chats in sidebar immediately 2026-05-03 20:10:05 -07:00
Michael Lam c93c7efd20 docs: explain relative login script path 2026-05-03 19:44:02 -07:00
Michael Lam f0e6a9b788 fix: keep login assets out of service worker cache 2026-05-03 18:18:27 -07:00
Hermes Bot c07999f0ce Stage 288: PR #1572 — collapse duplicate provider groups (closes #1568) by @nesquena-hermes — APPROVED 2026-05-03 22:37:43 +00:00
Hermes Bot 421f40c2cf Stage 288: PR #1571 — cron profile isolation (closes #1573) by @kowenhaoai — APPROVED + reviewer fix + post-review tightening 2026-05-03 22:37:43 +00:00
nesquena-hermes df03055def Address review feedback: tighten profile-resolution error handling
Three small follow-ups from the review:

1. Remove the over-broad except Exception around get_active_hermes_home()
   in _handle_cron_run. The function is in-memory dict reads + one
   Path.is_dir() stat — if it raises from inside a request handler,
   api.profiles is in a state we shouldn't be making cron decisions in.
   A silent fallback to _profile_home=None re-introduces the exact
   bug #1573 fixes (worker thread runs unpinned against process-global
   HERMES_HOME). Better to 500 the request than risk silent cross-
   profile state corruption.

2. Add a thread-safety note on os.environ mutation in api/profiles.py
   explaining why _cron_env_lock is sufficient — CPython env-var
   assignment is GIL-protected at the bytecode level but the multi-step
   read-modify-write pattern (snapshot prev → assign new → restore on
   exit) is not atomic without explicit serialization. The lock makes
   the entire context-manager body run-to-completion serially, including
   any subprocess.Popen() calls inside run_job() that inherit the env.

3. New regression test (test_cron_run_does_not_silently_swallow_profile_resolution_errors)
   pinning the no-silent-fallback contract via source-level assertion.
   Catches future re-introduction of the over-broad except clause.

Co-authored-by: kowenhaoai <kowenhaoai@users.noreply.github.com>
2026-05-03 22:29:57 +00:00