diff --git a/CHANGELOG.md b/CHANGELOG.md index 2ce43aee..3f77419b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,34 @@ # Hermes Web UI -- Changelog +## [v0.50.288] — 2026-05-03 + +### Fixed (3 PRs — picker symmetry + cron profile isolation — closes #1567, #1568, #1573) + +- **Nous Portal endpoint disagreement + featured-set cap** (#1569; closes #1567) — reporter (Deor, Discord, relayed by @AvidFuturist) saw Settings → Providers card showing `"Nous Portal — 396 models · OAuth"` while the in-conversation model picker dropdown listed only the 4 hardcoded curated entries (Claude Opus 4.6, Claude Sonnet 4.6, GPT-5.4 Mini, Gemini 3.1 Pro Preview). Two related root-shape bugs bundled. **(1)** Asymmetric auth detection — `api/providers.py:get_providers` iterates ALL OAuth providers regardless of authentication state and unconditionally live-fetches the catalog, while `api/config.py:_build_available_models_uncached` only iterates providers in `detected_providers`, gated on `hermes_cli.models.list_available_providers().authenticated`. That flag can disagree with `hermes_cli.auth.get_auth_status().logged_in`, so when the disagreement happens for Nous, the picker silently falls through to the curated 4-entry static list while the providers card keeps showing the live catalog. **Fix:** added explicit `get_auth_status("nous").logged_in` check after the existing `list_available_providers()` loop — picker now includes Nous whenever the providers card would. **(2)** UX cap — even with the disagreement fixed, dumping a 397-model catalog into a flat dropdown is unusable. New `_build_nous_featured_set()` helper at `api/config.py:965` runs the same algorithm in both `/api/models` and `/api/models/live` so background enrichment doesn't undo the trim. Selection rules (deterministic): sticky-selection always pinned, every curated flagship preserved, vendor round-robin via `_NOUS_VENDOR_PRIORITY` for top-up to 15. Disclosure pattern: optgroup label `"Nous Portal (15 of 397)"`, new `extra_models` field on the API surface, slash command + `_dynamicModelLabels` map hydrated from both halves so a model selected outside the featured slice still renders with its proper label, providers card uses `models_total` for the header count + small `+N more` disclosure pill at the end of the rendered pill list. **(3)** Stale-fallback poisoning — when authenticated AND live-fetch returns `[]` (transient hermes_cli failure, OAuth refresh in flight, cache miss), omit the Nous group entirely rather than falling back to stale-4 (which actively contradicts the providers card instead of self-healing). Static fallback only when `hermes_cli` is unavailable or raises (test envs, package mismatches). 20 new tests in `tests/test_issue1567_nous_picker_capacity_and_symmetry.py` covering selection helper invariants, large-catalog cap behavior, detection symmetry, live-fetch-empty handling, providers/picker symmetry, frontend extras contract. + +- **Cron Scheduled Jobs panel respects per-request active profile** (#1571 by @kowenhaoai; closes #1573) — `/api/crons*` endpoints called into `cron.jobs` (from `hermes-agent`), whose path resolver reads `HERMES_HOME` from `os.environ` at call time. The WebUI's per-request profile isolation (#798) is thread-local — set per-request from the `hermes_profile` cookie in `server.py`, cleared after the request — so those two mechanisms didn't talk to each other and `cron.jobs` always saw the process-default `HERMES_HOME` no matter which profile the request belonged to. CRUD operations silently wrote to the wrong `jobs.json`. **Fix:** two new context managers in `api/profiles.py:139-260`, both holding a module-level `_cron_env_lock`. `cron_profile_context()` is the HTTP-side variant (resolves home via `get_active_hermes_home()` which honors the TLS cookie, swaps `os.environ['HERMES_HOME']`, re-patches the cached `cron.jobs.HERMES_DIR/CRON_DIR/JOBS_FILE/OUTPUT_DIR` module constants, restores everything on exit). `cron_profile_context_for_home(home)` is the thread-side variant (worker threads have no TLS context, so the HTTP handler captures the active home at dispatch time and passes it explicitly). All 12 cron endpoints wrapped (6 GET + 6 POST). `_handle_cron_run` additionally captures the TLS-active home at dispatch and forwards it into `_run_cron_tracked(job, profile_home)` so cron output files land in the correct profile directory. Pre-release reviewer pushed test-skip-on-missing-agent fix so machines without `~/hermes-agent` run the suite cleanly. Post-review tightening: removed an over-broad `except Exception` around `get_active_hermes_home()` in `_handle_cron_run` (silent fallback to `_profile_home=None` would have re-introduced the exact bug the PR fixes — let any unexpected exception 500 the request rather than risk silent cross-profile state corruption); added thread-safety note on `os.environ` mutation explaining why `_cron_env_lock` is sufficient given CPython GIL semantics + `subprocess.Popen` env inheritance at fork time. 4 regression tests in `tests/test_scheduled_jobs_profile_isolation.py`. Two follow-up issues filed for architectural concerns (#1574 lock granularity, #1575 in-process scheduler bypass) — both deferred as out of scope. **Verified end-to-end via real browser test on isolated environment** (12 sessions, 3 projects, 6 default crons + 1 work-only-cron, 2 profiles): UI profile switch → cron tab auto-refreshes to show only target profile's jobs, both directions; on-disk verification confirmed perfect isolation in `~/.hermes/cron/jobs.json` (default profile) vs `~/.hermes/profiles/work/cron/jobs.json`. + +- **Collapse duplicate provider groups + guard provider-id-as-model.default** (#1572; closes #1568) — reporter (Deor, Discord, relayed by @AvidFuturist) saw the Settings → Default Model dropdown rendering OpenCode Go provider as TWO separate optgroups: `"OpenCode Go"` (canonical, with all 14 catalog models) and `"Opencode_Go"` (phantom group containing one self-referential entry). Three structural causes (all in `api/config.py:_build_available_models_uncached`). **(1)** Detection-path id leakage — `cfg["providers"]` keys are read verbatim, so a config with `providers.opencode_go.api_key` (underscore variant) AND another path adding the canonical `opencode-go` (e.g. via `active_provider`) end up with both in `detected_providers`, creating two distinct provider groups with the second labelled via `pid.title()` fallback as `"Opencode_Go"`. **(2)** Injection-block rogue model — the default-model injection block puts ANY `model.default` string into the picker as a fake option, so a stray `model.default: opencode_go` (provider id mistakenly used as a model id) surfaces as a phantom model labelled `"Opencode GO"`. **(3)** Empty-group bleed — when a non-canonical provider id makes it into `detected_providers` but has no entry in `_PROVIDER_MODELS`, the build loop creates an optgroup with zero models. **Fix:** new `_canonicalise_provider_id()` helper folds underscores to hyphens, lowercases, applies alias resolution only when the alias target is itself canonical in `_PROVIDER_DISPLAY` (the constraint that prevents `x-ai` from round-tripping through the alias table to `xai`). Detection-path canonicalises before adding to `detected_providers`; same treatment in the `only_show_configured` intersection. Post-collection dedup pass re-canonicalises every entry (belt-and-braces against future regressions in any of the ~25 `detected_providers.add(...)` callsites). Provider-id guard on the model.default injection block — when the injected value matches a known provider display name or alias (after underscore/case normalization), skip the injection and emit a `logger.warning`. Real unknown model IDs (newly released models, custom endpoints) still get injected — only provider-shaped values are rejected. Empty-group filter at end of build (drops optgroups with zero models, with `custom:` exemption since users may want an empty card visible as a reminder). 17 new tests in `tests/test_issue1568_duplicate_provider_groups.py` covering the helper unit, dedup E2E, model.default guard, empty-group filter. Plus one structural test fix in `tests/test_issue604_all_providers_model_picker.py:test_cfg_providers_only_adds_known` — widened the regex window from 500 → 1500 chars so the new documentation comment block doesn't push `_PROVIDER_MODELS` past the substring slice (pre-existing brittle-window pattern, not a new issue). + +### Tests + +4053 → **4094 passing** (+41 net: +20 from #1569 Nous featured-set, +17 from #1572 dedup, +4 from #1571 cron isolation). 0 regressions. Full suite in 108s. + +### Pre-release verification + +- All 41 PR-related tests pass standalone. +- All 4094 tests pass in the full suite (clean state, no pre-existing flakes triggered). +- Browser sanity (HTTP API checks against port 8789): 11/11 endpoints verified. +- All modified JS files (`static/commands.js`, `static/panels.js`, `static/ui.js`) pass `node -c`. +- **Real-world browser testing** on isolated test environment (12 sessions, 3 projects, 6 default crons + 1 work cron, 4 skills, 2 profiles): profile switch via UI updates the chip, sidebar re-renders, **cron tab auto-refreshes to show only target profile's jobs**. On-disk verification confirms perfect isolation. Profile chip + cron tab UI confirmed by vision-model. +- Pre-release Opus advisor: SHIP AS-IS — no MUST-FIX. All 5 verification questions check out (conflict-free merge, no deadlock between `_cron_env_lock` and `_available_models_cache_lock`, subprocess env inheritance under lock verified, `_canonicalise_provider_id` dedup-pass idempotent, stale-fallback handling correct under partial network failure). One non-blocking symmetry nit on `_run_cron_tracked` worker-side broad-except flagged as a follow-up issue. + +### Maintainer in-stage actions + +- **PR rebase verified clean** (REBASE-DEFAULT rule applied). All 3 PR branches were on or near current master; rebase was no-op. +- **#1571 post-review fix combination**: contributor's `df03055` (post-review tightening) was on `pull/1571/head` while reviewer's `d83e1d8` (test-skip-on-missing-agent) was on `origin/fix/scheduled-jobs-profile-isolation`. Cherry-picked the test-skip commit onto the contributor branch to combine both fixes before merging into stage. + + ## [v0.50.287] — 2026-05-03 ### Fixed (1 PR — closes another vector for the pending-message-loss class) diff --git a/ROADMAP.md b/ROADMAP.md index a4a83eb5..141344da 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -2,7 +2,7 @@ > Web companion to the Hermes Agent CLI. Same workflows, browser-native. > -> Last updated: v0.50.287 (May 03, 2026) — 4053 tests collected +> Last updated: v0.50.288 (May 03, 2026) — 4094 tests collected > Test source: `pytest tests/ --collect-only -q` > Per-version detail: see [CHANGELOG.md](./CHANGELOG.md) diff --git a/TESTING.md b/TESTING.md index 7ac8ffd9..43263518 100644 --- a/TESTING.md +++ b/TESTING.md @@ -1835,8 +1835,8 @@ Bridged CLI sessions: --- -*Last updated: v0.50.287, May 03, 2026* -*Total automated tests collected: 4053* +*Last updated: v0.50.288, May 03, 2026* +*Total automated tests collected: 4094* *Regression gate: tests/test_regressions.py* *Run: pytest tests/ -v --timeout=60* *Source: /* diff --git a/api/config.py b/api/config.py index 82c52398..5f0ffc06 100644 --- a/api/config.py +++ b/api/config.py @@ -653,6 +653,48 @@ def _resolve_provider_alias(name: str) -> str: return _PROVIDER_ALIASES.get(raw, name) +def _canonicalise_provider_id(name: object) -> str: + """Normalise a provider id slug into a stable lowercase-hyphenated form. + + Folds underscores to hyphens and lowercases the result, so a user with + ``providers.opencode_go.api_key`` in ``config.yaml`` and + ``model.provider: opencode-go`` sees ONE provider group, not two + (#1568). Then attempts alias resolution but only if the alias target + is itself a known canonical id in ``_PROVIDER_DISPLAY`` — this avoids + converting ``x-ai`` (canonical in WebUI's data structures) to ``xai`` + (the hermes_cli alias target which the WebUI doesn't index by). + + Examples:: + + opencode-go -> opencode-go (canonical, no change) + opencode_go -> opencode-go (underscore folded) + OpenCode-Go -> opencode-go (case folded) + OPENCODE_GO -> opencode-go (both folded) + z_ai -> zai (alias-resolved — zai is canonical) + x-ai -> x-ai (preserved — x-ai is canonical) + + Empty input passes through as the empty string. Unknown ids preserve + their normalised form. + """ + if not name: + return "" + raw = str(name).strip().lower().replace("_", "-") + if not raw: + return "" + # Already a canonical id known to _PROVIDER_DISPLAY/_PROVIDER_MODELS: + # keep as-is to avoid round-tripping through aliases (e.g. x-ai → xai). + if raw in _PROVIDER_DISPLAY or raw in _PROVIDER_MODELS: + return raw + # Try alias resolution. Only accept the result if it's itself a + # canonical id in _PROVIDER_DISPLAY — that prevents aliases pointing + # at non-canonical strings (legacy, hermes_cli-specific) from leaking + # in. Falls back to the normalised input otherwise. + resolved = _resolve_provider_alias(raw) + if resolved and resolved.lower() in _PROVIDER_DISPLAY: + return resolved.lower() + return raw + + # Well-known models per provider (used to populate dropdown for direct API providers) _PROVIDER_MODELS = { "anthropic": [ @@ -899,6 +941,122 @@ def _format_nous_label(mid: str) -> str: return f"{base} (via Nous)" +# Soft cap on how many Nous Portal models surface in the picker dropdown. +# Above this count, _build_nous_featured_set() trims the visible list to +# ~_NOUS_FEATURED_TARGET entries; the full catalog is still returned to the +# client under ``extra_models`` so /model autocomplete covers everything. +# Caps reflect human scannability — a 25-row dropdown is the practical UX +# ceiling, and per-vendor sampling at 15 keeps the flagship shape visible +# without one vendor dominating. +_NOUS_FEATURED_THRESHOLD = 25 +_NOUS_FEATURED_TARGET = 15 + +# Vendor-prefix priority order for featured selection. Lower index = picked +# earlier when sampling the live catalog. Reflects which vendors users have +# historically reached for first via Nous Portal (driven by the curated +# static list maintained in _PROVIDER_MODELS["nous"] and Discord feedback). +_NOUS_VENDOR_PRIORITY = ( + "anthropic", "openai", "google", "moonshotai", "z-ai", + "minimax", "qwen", "x-ai", "deepseek", "stepfun", + "xiaomi", "tencent", "nvidia", "arcee-ai", +) + + +def _build_nous_featured_set( + live_ids: list[str], + *, + selected_model_id: str | None = None, + target: int = _NOUS_FEATURED_TARGET, +) -> tuple[list[str], list[str]]: + """Trim a Nous Portal catalog into a (featured, extras) split. + + ``featured`` is what the picker dropdown renders. ``extras`` is everything + else — kept available so the slash-command `/model` autocomplete and the + ``_dynamicModelLabels`` map cover the full catalog. + + Selection rules (in order, deterministic): + + 1. Always include the user's currently-selected model if it's in the + catalog (preserves selection stickiness — no orphan IDs in the + dropdown after a refresh). + 2. Always include every entry from the curated static + ``_PROVIDER_MODELS["nous"]`` list whose id maps onto a live id — + those four are explicitly maintained as flagship picks. + 3. Top up to ``target`` by walking ``_NOUS_VENDOR_PRIORITY`` round-robin + (one model per vendor each pass) so no vendor monopolises the slot + budget. Within a vendor, the original ``live_ids`` order is preserved + — that's the order Nous Portal returned, which approximates recency. + + Returns ``(featured_ids, extras_ids)`` — both lists are subsets of + ``live_ids`` with disjoint membership and union equal to ``live_ids``. + + For catalogs ≤ ``_NOUS_FEATURED_THRESHOLD`` entries the function is a + no-op: ``featured == live_ids``, ``extras == []``. + """ + if not live_ids: + return [], [] + if len(live_ids) <= _NOUS_FEATURED_THRESHOLD: + return list(live_ids), [] + + chosen: list[str] = [] # preserves insertion order + chosen_set: set[str] = set() + + def _add(mid: str) -> None: + if mid and mid not in chosen_set: + chosen.append(mid) + chosen_set.add(mid) + + # Rule 1: sticky selection. Strip "@nous:" prefix if present so we can + # match against the live id space (which is bare "vendor/model"). + if selected_model_id: + sel = selected_model_id + if sel.startswith("@nous:"): + sel = sel[len("@nous:"):] + if sel in live_ids: + _add(sel) + + # Rule 2: curated flagships. Extract the bare ids from the static list + # entries (which are stored as "@nous:vendor/model"). + for static in _PROVIDER_MODELS.get("nous", []): + sid = static.get("id", "") + if sid.startswith("@nous:"): + sid = sid[len("@nous:"):] + if sid in live_ids: + _add(sid) + + # Rule 3: vendor-priority round-robin top-up. + by_vendor: dict[str, list[str]] = {} + for mid in live_ids: + if mid in chosen_set: + continue + vendor = mid.split("/", 1)[0] if "/" in mid else "" + by_vendor.setdefault(vendor, []).append(mid) + + # Walk vendors in priority order, then any leftover vendors alphabetically. + priority = list(_NOUS_VENDOR_PRIORITY) + leftover = sorted(v for v in by_vendor if v not in set(priority)) + vendor_order = priority + leftover + + # Round-robin: one model per vendor per pass until we hit the target or + # exhaust every bucket. + while len(chosen) < target: + added_this_pass = 0 + for vendor in vendor_order: + if len(chosen) >= target: + break + bucket = by_vendor.get(vendor) + if not bucket: + continue + _add(bucket.pop(0)) + added_this_pass += 1 + if added_this_pass == 0: + break # all buckets empty + + # Anything not chosen becomes extras (full-catalog completion surface). + extras = [m for m in live_ids if m not in chosen_set] + return chosen, extras + + def _apply_provider_prefix( raw_models: list[dict], provider_id: str, @@ -1767,6 +1925,22 @@ def get_available_models() -> dict: logger.debug("Failed to get key source for provider %s", _p.get("id", "unknown")) detected_providers.add(_p["id"]) _hermes_auth_used = True + + # Belt-and-braces: list_available_providers() is the primary signal + # for OAuth providers, but its `authenticated` field can disagree + # with `get_auth_status().logged_in` on some hermes_cli versions + # (the two fields are computed via different code paths). When the + # disagreement happens for Nous Portal, the Settings → Providers + # card renders the live catalog (because api/providers.py iterates + # all OAuth providers regardless of authentication state) but the + # picker dropdown comes up empty — a confusing asymmetry reported + # in #1567. Add Nous explicitly when get_auth_status agrees so the + # picker stays in sync with the providers card. + try: + if _gas("nous").get("logged_in"): + detected_providers.add("nous") + except Exception: + logger.debug("Failed to check Nous Portal auth status") except Exception: logger.debug("Failed to detect auth providers from hermes") @@ -1844,11 +2018,21 @@ def get_available_models() -> dict: # Also detect providers explicitly listed in config.yaml providers section. # A user may configure a provider key via config.yaml providers..api_key # without setting the corresponding env var. (#604) + # + # Canonicalise the id slug here so a user with ``providers.opencode_go`` + # (underscore variant) doesn't see TWO provider groups in the picker — + # one for the canonical ``opencode-go`` from active_provider detection + # and a phantom ``Opencode_Go`` group for the config-key form (#1568). + # The same applies to mixed-case ids like ``OpenCode-Go`` and to + # legitimate aliases like ``z-ai`` → ``zai``. _cfg_providers = cfg.get("providers", {}) if isinstance(_cfg_providers, dict): for _pid_key in _cfg_providers: - if _pid_key in _PROVIDER_MODELS or _pid_key in cfg.get("providers", {}): - detected_providers.add(_pid_key) + _canonical = _canonicalise_provider_id(_pid_key) + if not _canonical: + continue + if _canonical in _PROVIDER_MODELS or _canonical in _cfg_providers or _pid_key in _cfg_providers: + detected_providers.add(_canonical) def _normalize_base_url_for_match(value: object) -> str: url = str(value or "").strip().rstrip("/") @@ -2115,10 +2299,30 @@ def get_available_models() -> dict: configured_providers.add(active_provider) cfg_providers = cfg.get("providers", {}) if isinstance(cfg_providers, dict): - configured_providers.update(cfg_providers.keys()) + # Canonicalise here too — same rationale as #1568 detection + # path. Without this, only_show_configured mode could + # exclude detected ``opencode-go`` because configured_providers + # only has the underscore-variant key from config.yaml. + configured_providers.update( + _canonicalise_provider_id(k) or k for k in cfg_providers.keys() + ) # Only show providers that are both detected and configured detected_providers = detected_providers.intersection(configured_providers) + # Post-collection dedup: re-canonicalise every entry so any path that + # added a non-canonical id (mixed-case from auth-store, raw config-key, + # legacy alias) gets folded onto the canonical key. Belt-and-braces for + # #1568 — protects against future regressions in any of the ~25 + # `detected_providers.add(...)` callsites without auditing each one. + # The fold is idempotent for already-canonical ids, so safe to run + # unconditionally. + if detected_providers: + _canonicalised_detected = set() + for _pid in detected_providers: + _c = _canonicalise_provider_id(_pid) or _pid + _canonicalised_detected.add(_c) + detected_providers = _canonicalised_detected + # 5. Build model groups if detected_providers: for pid in sorted(detected_providers): @@ -2241,43 +2445,102 @@ def get_available_models() -> dict: } ) elif pid == "nous": - # Nous Portal exposes a curated catalog (~30 models, currently) - # via inference-api.nousresearch.com. Like ollama-cloud, we + # Nous Portal exposes a curated catalog (~30 models on most + # accounts, up to several hundred for enterprise tiers) via + # inference-api.nousresearch.com. Like ollama-cloud, we # live-fetch through hermes_cli.models.provider_model_ids() # rather than relying on the static four-entry list, which - # chronically drifts out of date (#1538). Fall back to the - # static list when hermes_cli is unavailable (test envs, - # package mismatches) so the picker is never empty. + # chronically drifts out of date (#1538). + # + # When the catalog exceeds _NOUS_FEATURED_THRESHOLD (~25) + # the picker dropdown gets a curated subset to stay + # scannable — the full list is still returned under + # "extra_models" for the slash-command autocomplete and + # the dynamic-label map (#1567). The optgroup label is + # decorated with the truncation count so users know more + # exists. raw_models = [] + extra_models: list[dict] = [] + truncated_label_suffix = "" + live_fetch_failed = False try: from hermes_cli.models import provider_model_ids as _provider_model_ids live_ids = _provider_model_ids("nous") or [] - raw_models = [ - # Prefix every live id with "@nous:" so routing matches - # the explicit-provider-hint branch of resolve_model_provider - # (same convention as the curated static list — see - # tests/test_nous_portal_routing.py for the invariant). - {"id": f"@nous:{mid}", "label": _format_nous_label(mid)} - for mid in live_ids - ] except Exception: logger.warning("Failed to load Nous Portal models from hermes_cli") + live_ids = [] + live_fetch_failed = True - if not raw_models: - # Static fallback: deepcopy so dedup/prefix mutation - # below does not bleed into the module-level catalog. + if live_ids: + # Sticky-selection signal: prefer the explicitly-active + # model from cfg["model"]["model"] (what the user is + # currently using) over cfg["model"]["default"] (the + # configured default suggestion). Falls back to the + # latter so first-load before any selection still works. + _model_cfg = cfg.get("model", {}) + _selected = ( + (isinstance(_model_cfg, dict) and _model_cfg.get("model")) + or default_model + or None + ) + featured_ids, extras_ids = _build_nous_featured_set( + live_ids, + selected_model_id=_selected, + ) + # Prefix every live id with "@nous:" so routing matches + # the explicit-provider-hint branch of resolve_model_provider + # (same convention as the curated static list — see + # tests/test_nous_portal_routing.py for the invariant). + raw_models = [ + {"id": f"@nous:{mid}", "label": _format_nous_label(mid)} + for mid in featured_ids + ] + extra_models = [ + {"id": f"@nous:{mid}", "label": _format_nous_label(mid)} + for mid in extras_ids + ] + if extras_ids: + # Show "(15 of 397)" so the user understands the picker + # is showing a featured subset, not a broken short list. + truncated_label_suffix = ( + f" ({len(featured_ids)} of {len(live_ids)})" + ) + elif not live_fetch_failed: + # Live-fetch returned an empty list AND did not raise — + # the user is gated as authenticated by detection above + # but the catalog endpoint replied with no models. + # Showing the static 4-entry curated list here would + # contradict the providers card (which always shows + # the live catalog) — exactly the asymmetry #1567 + # reports. Omit the Nous group entirely; the providers + # card already tells the truth, and a transient empty + # response will self-heal on the next cache rebuild. + logger.warning( + "Nous Portal authenticated but live-fetch returned empty — " + "omitting from picker (will retry on next cache rebuild)" + ) + else: + # hermes_cli unavailable / raised — fall back to the + # curated 4-entry static list so the picker is never + # empty in this degraded state. This matches pre-#1538 + # behaviour for environments without hermes_cli (test + # envs, package mismatches, isolated WebUI builds). raw_models = copy.deepcopy(_PROVIDER_MODELS.get("nous", [])) if raw_models: models = _apply_provider_prefix(raw_models, pid, active_provider) - groups.append( - { - "provider": provider_name, - "provider_id": pid, - "models": models, - } - ) + # Apply the same prefix transform to extras so /model + # autocomplete sees consistent IDs across the two lists. + extras = _apply_provider_prefix(extra_models, pid, active_provider) if extra_models else [] + group_entry = { + "provider": provider_name + truncated_label_suffix, + "provider_id": pid, + "models": models, + } + if extras: + group_entry["extra_models"] = extras + groups.append(group_entry) elif pid in _PROVIDER_MODELS or pid in cfg.get("providers", {}): raw_models = copy.deepcopy(_PROVIDER_MODELS.get(pid, [])) detected_models = auto_detected_models_by_provider.get(pid, []) @@ -2332,34 +2595,69 @@ def get_available_models() -> dict: ) if default_model: - all_ids_norm = {_norm_model_id(m["id"]) for g in groups for m in g.get("models", [])} - if _norm_model_id(default_model) not in all_ids_norm: - label = _get_label_for_model(default_model, groups) - target_display = ( - _PROVIDER_DISPLAY.get(active_provider, active_provider or "").lower() - if active_provider - else "" + # Guard against provider-id values mistakenly stored in + # ``model.default``. The injection logic below puts ANY string + # into the picker as a fake option, so a stray provider id + # surfaces as a self-referential phantom model labelled e.g. + # ``Opencode GO`` — a 15th entry under the OpenCode Go group + # (#1568). The user's misconfig is real, but the picker is + # the wrong surface to surface it; we'd rather skip injection + # and emit a warning so the underlying config issue is logged. + _looks_like_provider_id = ( + str(default_model).strip().lower().replace("_", "-") in _PROVIDER_DISPLAY + or _canonicalise_provider_id(default_model) in _PROVIDER_DISPLAY + ) + if _looks_like_provider_id: + logger.warning( + "Suspicious model.default value %r — looks like a provider id, " + "not a model id. Skipping picker injection. Check `model.default` " + "in config.yaml.", + default_model, ) - injected = False - for g in groups: - if target_display and g.get("provider", "").lower() == target_display: - g["models"].insert(0, {"id": default_model, "label": label}) - injected = True - break - if not injected and groups: - groups.append( - { - "provider": "Default", - "provider_id": active_provider or "default", - "models": [{"id": default_model, "label": label}], - } + else: + all_ids_norm = {_norm_model_id(m["id"]) for g in groups for m in g.get("models", [])} + if _norm_model_id(default_model) not in all_ids_norm: + label = _get_label_for_model(default_model, groups) + target_display = ( + _PROVIDER_DISPLAY.get(active_provider, active_provider or "").lower() + if active_provider + else "" ) + injected = False + for g in groups: + if target_display and g.get("provider", "").lower() == target_display: + g["models"].insert(0, {"id": default_model, "label": label}) + injected = True + break + if not injected and groups: + groups.append( + { + "provider": "Default", + "provider_id": active_provider or "default", + "models": [{"id": default_model, "label": label}], + } + ) # Post-process: ensure model IDs are globally unique across groups. # When multiple providers expose the same bare model ID, prefix # collisions with @provider_id: so the frontend can distinguish them. _deduplicate_model_ids(groups) + # Defense-in-depth: drop any optgroup that ended up with zero models + # — those are pure UI noise. A zero-model group typically means a + # detection path added an id that has no static catalog AND the + # live-fetch returned empty (#1568 — the user's + # ``providers.opencode_go`` config-key path produced an empty + # ``Opencode_Go`` group at the end of the picker before this fix). + # Custom providers from ``custom_providers`` config are exempt — + # they may legitimately render with zero entries when the user + # hasn't filled in models yet but wants the card visible. + groups = [ + g for g in groups + if g.get("models") + or (g.get("provider_id") or "").startswith("custom:") + ] + return { "active_provider": active_provider, "default_model": default_model, diff --git a/api/profiles.py b/api/profiles.py index fc94a336..5a8229a3 100644 --- a/api/profiles.py +++ b/api/profiles.py @@ -139,6 +139,135 @@ def get_active_hermes_home() -> Path: +# ── Cron-call profile isolation (issue: Scheduled jobs ignored active profile) ─ +# `cron.jobs` reads HERMES_HOME from os.environ (process-global) at function- +# call time. That bypasses our per-request thread-local profile, so the +# `/api/crons*` endpoints always returned the process-default profile's jobs. +# This context manager swaps HERMES_HOME (and the cached module-level constants +# in cron.jobs) for the duration of a cron call, serialized by a lock so +# concurrent requests from different profiles don't race on the global env var. +# +# Thread-safety note on os.environ mutation: +# CPython's os.environ assignment is GIL-protected at the bytecode level, but +# multi-step read-modify-write sequences (snapshot prev → assign new → restore +# on exit) are NOT atomic without explicit serialization. The _cron_env_lock +# below makes the entire context-manager body run-to-completion serially, so +# all webui access to HERMES_HOME goes through one thread at a time. Any +# subprocess.Popen() call inside `run_job` inherits the env at fork time, +# which is also under the lock — so child processes always see a consistent +# (own-profile) HERMES_HOME, never a half-swapped state. +_cron_env_lock = threading.Lock() + + +class cron_profile_context_for_home: + """Context manager that pins HERMES_HOME to an explicit profile home path. + + Use this variant from worker threads that don't have TLS context (e.g. the + background thread started by /api/crons/run). The HTTP-side variant below + resolves the home via TLS. + """ + + def __init__(self, home: Path): + self._home = Path(home) + + def __enter__(self): + _cron_env_lock.acquire() + try: + self._prev_env = os.environ.get('HERMES_HOME') + os.environ['HERMES_HOME'] = str(self._home) + + # Re-patch cron.jobs module-level constants (see main context manager + # below for the rationale). + self._prev_cj = None + try: + import cron.jobs as _cj + self._prev_cj = (_cj.HERMES_DIR, _cj.CRON_DIR, _cj.JOBS_FILE, _cj.OUTPUT_DIR) + _cj.HERMES_DIR = self._home + _cj.CRON_DIR = self._home / 'cron' + _cj.JOBS_FILE = _cj.CRON_DIR / 'jobs.json' + _cj.OUTPUT_DIR = _cj.CRON_DIR / 'output' + except (ImportError, AttributeError): + logger.debug("cron_profile_context_for_home: cron.jobs unavailable") + except Exception: + _cron_env_lock.release() + raise + return self + + def __exit__(self, exc_type, exc_val, exc_tb): + try: + if self._prev_env is None: + os.environ.pop('HERMES_HOME', None) + else: + os.environ['HERMES_HOME'] = self._prev_env + if self._prev_cj is not None: + try: + import cron.jobs as _cj + _cj.HERMES_DIR, _cj.CRON_DIR, _cj.JOBS_FILE, _cj.OUTPUT_DIR = self._prev_cj + except (ImportError, AttributeError): + pass + finally: + _cron_env_lock.release() + return False + + +class cron_profile_context: + """Context manager that pins HERMES_HOME to the TLS-active profile. + + Usage: + with cron_profile_context(): + from cron.jobs import list_jobs + jobs = list_jobs(include_disabled=True) + + Serializes cron API calls across profiles (cron API is low-frequency; + serialization cost is negligible compared to correctness). + """ + + def __enter__(self): + _cron_env_lock.acquire() + try: + self._prev_env = os.environ.get('HERMES_HOME') + home = get_active_hermes_home() + os.environ['HERMES_HOME'] = str(home) + + # Re-patch cron.jobs module-level constants. They are snapshot at + # import time (line 68-71 of cron/jobs.py) and don't participate in + # the module's __getattr__ lazy path, so env-var alone is not enough + # for callers that reference the module constants directly. + self._prev_cj = None + try: + import cron.jobs as _cj + self._prev_cj = (_cj.HERMES_DIR, _cj.CRON_DIR, _cj.JOBS_FILE, _cj.OUTPUT_DIR) + _cj.HERMES_DIR = home + _cj.CRON_DIR = home / 'cron' + _cj.JOBS_FILE = _cj.CRON_DIR / 'jobs.json' + _cj.OUTPUT_DIR = _cj.CRON_DIR / 'output' + except (ImportError, AttributeError): + logger.debug("cron_profile_context: cron.jobs unavailable; env-var only") + except Exception: + _cron_env_lock.release() + raise + return self + + def __exit__(self, exc_type, exc_val, exc_tb): + try: + # Restore env var + if self._prev_env is None: + os.environ.pop('HERMES_HOME', None) + else: + os.environ['HERMES_HOME'] = self._prev_env + + # Restore cron.jobs module constants + if self._prev_cj is not None: + try: + import cron.jobs as _cj + _cj.HERMES_DIR, _cj.CRON_DIR, _cj.JOBS_FILE, _cj.OUTPUT_DIR = self._prev_cj + except (ImportError, AttributeError): + pass + finally: + _cron_env_lock.release() + return False + + def get_hermes_home_for_profile(name: str) -> Path: """Return the HERMES_HOME Path for *name* without mutating any process state. diff --git a/api/providers.py b/api/providers.py index 74b41354..86825774 100644 --- a/api/providers.py +++ b/api/providers.py @@ -392,10 +392,19 @@ def get_providers() -> dict[str, Any]: pass models = list(_PROVIDER_MODELS.get(pid, [])) + models_total = len(models) # Nous Portal: prefer the live catalog so the providers card matches # the dropdown picker (#1538). Same fallback shape as the static-only # case below — when hermes_cli is unavailable or its lookup raises, # we keep the four-entry curated list. + # + # On large-tier accounts (#1567 reporter Deor saw 396 entries), we + # render the same featured subset the picker uses so the providers + # card body doesn't become a 396-pill wall. The full count is still + # reported via models_total — surfaced in the header line as + # "396 models · OAuth" by static/panels.js — so the user knows the + # complete catalog is reachable (via /model autocomplete or a future + # "show all" disclosure if added). if pid == "nous": try: from hermes_cli.models import provider_model_ids as _provider_model_ids @@ -403,12 +412,14 @@ def get_providers() -> dict[str, Any]: live_ids = _provider_model_ids("nous") or [] if live_ids: # Lazy-import to avoid circular dep with api.config. - from api.config import _format_nous_label + from api.config import _format_nous_label, _build_nous_featured_set + featured_ids, _extras = _build_nous_featured_set(live_ids) models = [ {"id": f"@nous:{mid}", "label": _format_nous_label(mid)} - for mid in live_ids + for mid in featured_ids ] + models_total = len(live_ids) except Exception: logger.debug("Failed to load Nous Portal models from hermes_cli") # Also include models from config.yaml providers section @@ -420,6 +431,13 @@ def get_providers() -> dict[str, Any]: models = models + [{"id": k, "label": k} for k in cfg_models.keys()] elif isinstance(cfg_models, list): models = models + [{"id": k, "label": k} for k in cfg_models] + # Recompute models_total when config.yaml contributes additional + # entries on top of the live/static catalog. For non-Nous + # providers models_total still equals len(models); for Nous + # we keep the live count (which already includes any models + # surfaced in the curated featured slice). + if pid != "nous": + models_total = len(models) providers.append({ "id": pid, @@ -430,6 +448,14 @@ def get_providers() -> dict[str, Any]: "key_source": key_source, "auth_error": auth_error, "models": models, + # models_total reflects the complete catalog size (e.g. 396 for + # an enterprise Nous Portal account), even when "models" is + # trimmed to a featured subset for UI scannability. The frontend + # uses this for the header text "396 models · OAuth" so users + # know the full catalog exists and is reachable via the slash + # command. For providers that don't trim, models_total == + # len(models) and the frontend behaves identically to before. + "models_total": models_total, }) # Scan custom_providers from config.yaml (e.g. glmcode, timicc) diff --git a/api/routes.py b/api/routes.py index 1c927620..8c430178 100644 --- a/api/routes.py +++ b/api/routes.py @@ -180,12 +180,33 @@ def _cron_output_content_window(text: str, limit: int = _CRON_OUTPUT_CONTENT_LIM return text[-limit:] -def _run_cron_tracked(job): - """Wrapper that tracks running state around cron.scheduler.run_job.""" +def _run_cron_tracked(job, profile_home=None): + """Wrapper that tracks running state around cron.scheduler.run_job. + + ``profile_home`` pins HERMES_HOME for this worker thread so output files + and run metadata land in the profile that triggered the run, not the + process-global default. Captured at dispatch time because the thread runs + after the HTTP request (and its TLS profile) has already been cleared. + """ from cron.scheduler import run_job # import here — runs inside a worker thread from cron.jobs import mark_job_run, save_job_output job_id = job.get("id", "") + + # Pin HERMES_HOME for the duration of this thread using a dedicated + # context manager variant that accepts the profile home directly + # (threads have no TLS, so get_active_hermes_home() can't resolve). + ctx = None + if profile_home is not None: + try: + from api.profiles import cron_profile_context_for_home + + ctx = cron_profile_context_for_home(profile_home) + ctx.__enter__() + except Exception: + logger.exception("Failed to pin profile %s for cron run", profile_home) + ctx = None + try: success, output, final_response, error = run_job(job) save_job_output(job_id, output) @@ -204,6 +225,11 @@ def _run_cron_tracked(job): except Exception: logger.debug("Failed to mark manual cron run failure for %s", job_id) finally: + if ctx is not None: + try: + ctx.__exit__(None, None, None) + except Exception: + logger.debug("Failed to release cron_profile_context for %s", job_id) _mark_cron_done(job_id) _PROVIDER_ALIASES = { @@ -2177,25 +2203,45 @@ def handle_get(handler, parsed) -> bool: return # SSE handled, no JSON response # ── Cron API (GET) ── + # All cron handlers touch cron.jobs which resolves HERMES_HOME from + # os.environ (process-global) at call time. Wrap in cron_profile_context + # so the TLS-active profile's jobs.json is read, not the process default. if parsed.path == "/api/crons": from cron.jobs import list_jobs + from api.profiles import cron_profile_context - return j(handler, {"jobs": list_jobs(include_disabled=True)}) + with cron_profile_context(): + return j(handler, {"jobs": list_jobs(include_disabled=True)}) if parsed.path == "/api/crons/output": - return _handle_cron_output(handler, parsed) + from api.profiles import cron_profile_context + + with cron_profile_context(): + return _handle_cron_output(handler, parsed) if parsed.path == "/api/crons/history": - return _handle_cron_history(handler, parsed) + from api.profiles import cron_profile_context + + with cron_profile_context(): + return _handle_cron_history(handler, parsed) if parsed.path == "/api/crons/run": - return _handle_cron_run_detail(handler, parsed) + from api.profiles import cron_profile_context + + with cron_profile_context(): + return _handle_cron_run_detail(handler, parsed) if parsed.path == "/api/crons/recent": - return _handle_cron_recent(handler, parsed) + from api.profiles import cron_profile_context + + with cron_profile_context(): + return _handle_cron_recent(handler, parsed) if parsed.path == "/api/crons/status": - return _handle_cron_status(handler, parsed) + from api.profiles import cron_profile_context + + with cron_profile_context(): + return _handle_cron_status(handler, parsed) # ── Skills API (GET) ── if parsed.path == "/api/skills": @@ -2892,23 +2938,43 @@ def handle_post(handler, parsed) -> bool: return _handle_terminal_close(handler, body) # ── Cron API (POST) ── + # See GET-side comment above: wrap in cron_profile_context so writes go + # to the TLS-active profile's jobs.json instead of the process default. if parsed.path == "/api/crons/create": - return _handle_cron_create(handler, body) + from api.profiles import cron_profile_context + + with cron_profile_context(): + return _handle_cron_create(handler, body) if parsed.path == "/api/crons/update": - return _handle_cron_update(handler, body) + from api.profiles import cron_profile_context + + with cron_profile_context(): + return _handle_cron_update(handler, body) if parsed.path == "/api/crons/delete": - return _handle_cron_delete(handler, body) + from api.profiles import cron_profile_context + + with cron_profile_context(): + return _handle_cron_delete(handler, body) if parsed.path == "/api/crons/run": - return _handle_cron_run(handler, body) + from api.profiles import cron_profile_context + + with cron_profile_context(): + return _handle_cron_run(handler, body) if parsed.path == "/api/crons/pause": - return _handle_cron_pause(handler, body) + from api.profiles import cron_profile_context + + with cron_profile_context(): + return _handle_cron_pause(handler, body) if parsed.path == "/api/crons/resume": - return _handle_cron_resume(handler, body) + from api.profiles import cron_profile_context + + with cron_profile_context(): + return _handle_cron_resume(handler, body) # ── File ops (POST) ── if parsed.path == "/api/file/delete": @@ -4416,6 +4482,23 @@ def _handle_live_models(handler, parsed): if not ids: return _finish({"provider": provider, "models": [], "count": 0}) + # For Nous Portal, apply the same featured-set cap that + # /api/models uses so background enrichment via _fetchLiveModels() + # doesn't undo the dropdown trim — otherwise a 397-model catalog + # would still flood the picker after the initial render finished + # the cap. The full list is returned via the main /api/models + # endpoint's extra_models field for /model autocomplete; the live + # endpoint is purely a dropdown-enrichment surface, so it should + # match the dropdown's visibility budget. (#1567) + if provider == "nous": + try: + from api.config import _build_nous_featured_set + _default_model = (cfg.get("model", {}) or {}).get("model") if isinstance(cfg.get("model"), dict) else None + _featured, _ = _build_nous_featured_set(ids, selected_model_id=_default_model) + ids = _featured + except Exception: + logger.debug("Failed to apply Nous featured-set cap for /api/models/live") + # Normalise to {id, label} — provider_model_ids() returns plain string IDs. # For ollama-cloud use the shared Ollama formatter (handles `:variant` suffix). # For all other providers use a simpler hyphen-split capitaliser. @@ -5149,7 +5232,22 @@ def _handle_cron_run(handler, body): return j(handler, {"ok": False, "job_id": job_id, "status": "already_running", "elapsed": round(elapsed, 1)}) _mark_cron_running(job_id) - threading.Thread(target=_run_cron_tracked, args=(job,), daemon=True).start() + # Capture the TLS-active profile home now — the thread runs after the + # request finishes, so TLS is gone by then. + # + # Resolve directly without a try/except: get_active_hermes_home() does + # in-memory dict reads + a single Path.is_dir() stat, so the only way + # it could raise from inside a request handler is if api.profiles + # itself partially failed to import (in which case we'd already be + # 500-ing the whole request). A silent fallback to None here would + # re-introduce the exact bug #1573 fixes — the worker thread would + # run unpinned against the process-global HERMES_HOME — so we'd + # rather let any unexpected exception 500 the request than corrupt + # cross-profile state. + from api.profiles import get_active_hermes_home + + _profile_home = get_active_hermes_home() + threading.Thread(target=_run_cron_tracked, args=(job, _profile_home), daemon=True).start() return j(handler, {"ok": True, "job_id": job_id, "status": "running"}) diff --git a/static/commands.js b/static/commands.js index 375a9d67..6dce4c48 100644 --- a/static/commands.js +++ b/static/commands.js @@ -136,6 +136,15 @@ async function _loadSlashModelSubArgs(force=false){ const id=_normalizeSlashSubArg(model&&model.id); if(id) values.push(id); } + // Include extra_models (the catalog tail that doesn't render as + //