mirror of
https://github.com/nesquena/hermes-webui.git
synced 2026-05-25 11:10:18 +00:00
Merge pull request #1638 from nesquena/stage-294
Release v0.50.294 — 3-PR batch (streaming stability trio + cache version stamp + race fix + readonly fs guard)
This commit is contained in:
@@ -1,5 +1,52 @@
|
||||
# Hermes Web UI -- Changelog
|
||||
|
||||
## [v0.50.294] — 2026-05-04
|
||||
|
||||
### Fixed (3 PRs — streaming stability trio + models cache version stamp + session race + readonly fs guard — closes #1430, #1470, #1623, #1624, #1625, #1633)
|
||||
|
||||
- **SSE app heartbeat lowered from 30s to 5s at every long-lived handler** (closes #1623) — kernel TCP keepalive (added v0.50.289 / #1581) declares a peer dead at `KEEPIDLE (10s) + KEEPINTVL (5s) × KEEPCNT (3) = 25s` worst-case. The five SSE handlers in `api/routes.py` (main agent stream, terminal, gateway-watcher, approval-poller, clarify-poller) all used 30s, which meant on flaky networks the kernel could tear the socket down before the app sent its first heartbeat byte — flaky-network drops at ~10s that users perceived as "the stream died around 10 seconds in" during long LLM thinking phases. **Fix:** new `_SSE_HEARTBEAT_INTERVAL_SECONDS = 5` constant referenced by every queue-poll site. Cost: ~150B/min when idle (12 extra heartbeats × 12 bytes), negligible. Many production SSE deployments use 5-15s app heartbeats specifically because TCP keepalive isn't reliable across all network paths (proxies, load balancers, mobile NAT). Regression test pins the inequality `app_heartbeat × 2 ≤ kernel_keepalive_window` so future tuning of either timer can't re-introduce the misalignment.
|
||||
|
||||
- **`_repair_stale_pending()` no longer fires on fresh turns** (closes #1624) — `_repair_stale_pending` in `api/models.py:716` triggered as soon as `pending_user_message` was set AND `active_stream_id` was missing from the live `STREAMS` registry. There was no time-based staleness guard, so any narrow race between the streaming thread clearing `pending_user_message` and `STREAMS.pop(stream_id)` produced a false-positive "**Previous turn did not complete.**" marker on a turn that actually finished correctly — every command-approval turn reliably reproduced this for at least one user. **Fix:** add `_REPAIR_STALE_PENDING_GRACE_SECONDS = 30` and bail when `time.time() - pending_started_at < grace`. Falsy `pending_started_at` (legacy sidecars from before the field was added in v0.50.283) is treated as "old enough" so legitimate legacy-data recovery still works. Plus a rate-limited `logger.warning`/`logger.debug` on every legitimate repair so the next batch of user reports tells us whether the underlying race still fires post-fix. **This is defense-in-depth, not the root-cause fix** — the streaming thread should never exit without clearing pending; tracked separately for future investigation.
|
||||
|
||||
- **Local model servers (LM Studio, Ollama, llama.cpp, vLLM, TabbyAPI, LocalAI) now keep their full HuggingFace-style model id** (closes #1625, reported by @akarichan8231) — `resolve_model_provider()` in `api/config.py:1149` stripped the provider prefix from a model id like `qwen/qwen3.6-27b` whenever (a) the model contained `/`, (b) `config.yaml` had `model.base_url` set, and (c) the prefix matched a known entry in `_PROVIDER_MODELS` (e.g. `qwen`, `openai`, `anthropic`, etc.). The strip is correct for OpenAI-compatible **proxies** (LiteLLM, OpenRouter relays) — `openai/gpt-5.4` → `gpt-5.4`. But local model servers are **not** proxies — they register models under their full HuggingFace path as the registry key. Stripping the prefix made LM Studio (or Ollama, llama.cpp, vLLM, TabbyAPI) miss the loaded model and silently load a brand-new instance with default settings, ignoring the user's tuned 131072 context / 4 parallel slots. **Fix:** new `_LOCAL_SERVER_PROVIDERS` set covering canonical names (`lmstudio`, `lm-studio`, `localai`, `ollama`, `llamacpp`, `llama-cpp`, `vllm`, `tabby`, `tabbyapi`, `koboldcpp`, `textgen`) and a new `_base_url_points_at_local_server()` heuristic that catches `provider: custom` + `base_url: http://localhost:1234/v1` setups too (via loopback / RFC1918 / IPv6-loopback IP detection). Either signal triggers no-strip. Backward compat is preserved for OpenAI-compatible proxies on public hosts (LiteLLM at `https://litellm.example.com/v1` continues to strip `openai/gpt-5.4` → `gpt-5.4`).
|
||||
|
||||
> **Behavior change for internal-network OpenAI-compatible proxies (RFC1918):** the loopback heuristic also matches private-IP base_urls (10/8, 172.16/12, 192.168/16). A team running an internal LiteLLM proxy at `http://10.5.0.1:1234/v1` now gets prefix preservation instead of stripping. LiteLLM accepts either form, so this is invisible in practice; users with a custom proxy on RFC1918 that requires the stripped form should configure it as a `custom_providers:` entry, which routes through the early `custom_providers` loop and never reaches the local-server detection.
|
||||
|
||||
- **`/api/models` disk cache now invalidated on every WebUI version change** (closes #1633, reported by @Deor on Discord) — `STATE_DIR/models_cache.json` was persisted across server restarts without any version stamp. A Docker container update from version A to version B read the cache file written by version A — users saw stale picker contents (missing models, phantom provider groups, e.g. the v0.50.281 4-model Nous Portal + `Opencode_Go` phantom) for up to 24 hours until either the TTL expired, an unrelated provider edit triggered `invalidate_models_cache()`, or they manually deleted the file. Reporter Deor updated to v0.50.292 — which contained fixes for #1538, #1539, and #1568 — did a hard refresh and cleared site data, and still saw byte-for-byte identical picker contents because the server kept reading the v0.50.281 cache file off the host-mounted volume. **Fix:** `_save_models_cache_to_disk()` now stamps payloads with `_webui_version` (resolved lazily from `api.updates.WEBUI_VERSION` to avoid a circular import) and `_schema_version = 2`. `_load_models_cache_from_disk()` rejects any cache where either field mismatches the runtime — every release auto-rebuilds from live provider data on the very next `/api/models` call. Legacy unstamped caches (pre-#1633 files) are also rejected, so the first read after upgrading to this release rebuilds cleanly. Schema version is independent of the WebUI version stamp so future cache-shape changes can invalidate older releases without relying on a tag bump alone. The early-init edge case (api.updates not yet loaded) skips the version check rather than wedging the boot — at worst an unstamped file is written once and rejected on the next call.
|
||||
|
||||
- **Session list race condition no longer makes today's sessions disappear** (closes #1430, reported by @Olyno) — `renderSessionList()` in `static/sessions.js` had no staleness guard. Multiple callers (message send, rename, session switch) fire it concurrently without awaiting, so a slower previous-day fetch could overwrite `_allSessions` with stale data after a faster newer fetch had already written today's data — manifesting as today's sessions disappearing when the user clicked an older conversation. **Fix:** new module-local `_renderSessionListGen` generation counter pre-incremented before the `await` and re-checked after it; stale calls (older `_gen`) self-discard before mutating state. Lightest-weight correct shape — no AbortController, no debounce, no state machine. Behavioral harness verifies three concurrent calls with varying delays correctly land only the most recently issued response. (PR #1635 by @bergeouss, AI-assisted via Hermes Agent.)
|
||||
|
||||
- **Read-only root filesystem under podman no longer crashes container startup** (closes #1470, reported by @cosmoceus) — `docker_init.bash` unconditionally called `groupmod`/`usermod` even when `/etc/group` and `/etc/passwd` were on a read-only filesystem (typical podman + `read_only=true` setup). `groupmod: cannot lock /etc/group; try again later.` killed the container at boot. **Fix:** writability check via `[ ! -w /etc/group ] || [ ! -w /etc/passwd ]`; on read-only mounts with matching UID/GID skip gracefully with a log message; on read-only mounts with mismatched UID/GID emit a clear `error_exit` directing the user to set matching IDs or disable `read_only=true`. (PR #1635 by @bergeouss.)
|
||||
|
||||
### Tests
|
||||
|
||||
4180 → **4245 passing** (+65 regression tests across `tests/test_issue1623_sse_heartbeat_alignment.py` (3), `tests/test_issue1624_repair_stale_pending_grace.py` (9), `tests/test_issue1625_local_server_model_id_preservation.py` (34, expanded for `lm-studio`/`localai`), `tests/test_issue1633_models_cache_version_stamp.py` (19); plus `tests/test_model_resolver.py` updates and `tests/test_model_cache_metadata.py` round-trip semantics). 0 regressions. Full suite ~120s.
|
||||
|
||||
### Pre-release verification
|
||||
|
||||
- **Self-built fixes** (#1631, #1636 — nesquena-hermes), independent review **APPROVED by nesquena** for both, with comprehensive end-to-end traces including reproducer harnesses for Deor's Docker-upgrade scenario (#1633) and the kernel-keepalive math (#1623).
|
||||
- **External contributor PR** #1635 by @bergeouss (AI-assisted via Hermes Agent), independent review **APPROVED by nesquena** with behavioral harness for the race fix (three concurrent fetches with varying delays — only the latest writes to state).
|
||||
- **Opus advisor pre-merge pass on #1631**: SHIP — no MUST-FIX, one SHOULD-FIX (rate-limited `_repair_stale_pending` telemetry) and three NITs (expanded `_LOCAL_SERVER_PROVIDERS`, RFC1918 CHANGELOG callout) absorbed in-PR (commit `2161fc1`).
|
||||
- **Opus advisor pre-merge pass on stage-294**: see "Opus-applied fixes" below.
|
||||
- `_SSE_HEARTBEAT_INTERVAL_SECONDS × 2 ≤ KEEPIDLE + KEEPINTVL × KEEPCNT` pinned by a regression test that derives the kernel window from `server.py` setsockopt block at runtime.
|
||||
- `_repair_stale_pending` grace guard exercised at: 5s-old turn (skip), grace-1s-old turn (skip), grace+30s-old turn (fire), missing/zero/garbage `pending_started_at` (fire — legacy compat), no pending-message (skip — pre-existing contract), live stream (skip — pre-existing contract).
|
||||
- `resolve_model_provider` exercised across local-server provider names + 7 loopback/private IP heuristic cases + backward-compat checks for OpenAI-compatible proxies on public hosts and OpenRouter pass-through. Helper `_base_url_points_at_local_server()` independently unit-tested against 11 url shapes.
|
||||
- End-to-end behavioral test (`test_docker_update_scenario_invalidates_old_cache`) reproduces Deor's exact reported scenario: a cache stamped at `v0.50.281` fails to load when runtime is `v0.50.292`, forcing a fresh rebuild that picks up the picker fixes shipped between releases.
|
||||
- Round-trip + version-mismatch + legacy-unstamped + schema-mismatch + early-init + corrupt-JSON + missing-file + atomic-overwrite + invalidate-cache-tear-down all pinned.
|
||||
- Cross-tool verified: agent has its own model-cache files at different paths (`hermes_cli/codex_models.py`, `hermes_cli/models.py`) — no collision.
|
||||
|
||||
### Opus-applied fixes (absorbed in-release)
|
||||
|
||||
**From #1631 in-PR Opus pre-merge pass (already on the PR's branch):**
|
||||
|
||||
- **SHOULD-FIX (`_repair_stale_pending` log volume)**: rate-limit the repair-firing telemetry by age — `logger.warning` for the diagnostically valuable race window (< 5 min, actual leak-path candidates that slipped past the grace guard) and `logger.debug` for the long-tail (orphaned sidecars from prior process lifetimes). Prevents reconnect loops on stuck sessions from flooding the log while preserving the diagnostic signal we want for tuning the grace constant.
|
||||
- **NIT (`_LOCAL_SERVER_PROVIDERS`)**: added `lm-studio` (hyphenated alias used in some `custom_providers:` configs) and `localai` (LocalAI project, common OpenAI-compatible local server). Test parametrize expanded to cover the new names plus pre-existing `koboldcpp` and `textgen` for symmetry.
|
||||
|
||||
**From #1636 stage-294 absorption (this release):**
|
||||
|
||||
- **Minor observation absorbed** — `_is_loadable_disk_cache()` now logs at DEBUG when rejecting (`schema=N vs M`, `version=A vs B`). Useful diagnostic when investigating future "why did my cache rebuild" questions.
|
||||
- **Code comment** added to `_is_loadable_disk_cache()` documenting that `_webui_version` is a string compare (not semver) — paired with `_schema_version` independent axis for breaking changes that lack a tag bump.
|
||||
|
||||
## [v0.50.293] — 2026-05-04
|
||||
|
||||
### Fixed (3 PRs — profile isolation trio + agent version badge + #1597 follow-up)
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
|
||||
> Web companion to the Hermes Agent CLI. Same workflows, browser-native.
|
||||
>
|
||||
> Last updated: v0.50.293 (May 04, 2026) — 4180 tests collected
|
||||
> Last updated: v0.50.294 (May 04, 2026) — 4245 tests collected
|
||||
> Test source: `pytest tests/ --collect-only -q`
|
||||
> Per-version detail: see [CHANGELOG.md](./CHANGELOG.md)
|
||||
|
||||
|
||||
+2
-2
@@ -1835,8 +1835,8 @@ Bridged CLI sessions:
|
||||
|
||||
---
|
||||
|
||||
*Last updated: v0.50.293, May 04, 2026*
|
||||
*Total automated tests collected: 4180*
|
||||
*Last updated: v0.50.294, May 04, 2026*
|
||||
*Total automated tests collected: 4245*
|
||||
*Regression gate: tests/test_regressions.py*
|
||||
*Run: pytest tests/ -v --timeout=60*
|
||||
*Source: <repo>/*
|
||||
|
||||
+209
-14
@@ -1147,6 +1147,72 @@ def _deduplicate_model_ids(groups: list[dict]) -> None:
|
||||
model["label"] = f"{original_id} ({provider_name})"
|
||||
|
||||
|
||||
# ── Local-server provider preservation (#1625) ─────────────────────────────
|
||||
#
|
||||
# LM Studio, Ollama, llama.cpp, vLLM, TabbyAPI etc. are inference servers,
|
||||
# not OpenAI-compatible proxies. They register models under their FULL path
|
||||
# as the registry key (the HuggingFace-style "namespace/model" id, e.g.
|
||||
# "qwen/qwen3.6-27b"). Stripping the namespace prefix would cause a registry
|
||||
# miss and the server loads a brand-new instance with default settings,
|
||||
# silently ignoring the user's tuned context length / parallel slots.
|
||||
#
|
||||
# This is distinct from OpenAI-compatible proxies (LiteLLM, OpenRouter relays)
|
||||
# where stripping "openai/gpt-5.4" → "gpt-5.4" is the correct behavior.
|
||||
#
|
||||
# Detection has two layers:
|
||||
# 1. Static set of known local-server provider names (canonical + common
|
||||
# custom-provider naming).
|
||||
# 2. Loopback / private-host base_url heuristic: an OpenAI-compatible URL
|
||||
# pointing at 127.0.0.1, localhost, or a private IP block is almost
|
||||
# certainly a local model server, regardless of the provider name.
|
||||
# Reuses the same private-IP detection logic used elsewhere in
|
||||
# api/config.py for SSRF host trust.
|
||||
_LOCAL_SERVER_PROVIDERS = {
|
||||
"lmstudio", # canonical (in hermes_cli.models.CANONICAL_PROVIDERS)
|
||||
"lm-studio", # alias used in some custom_providers configs (#1625 Opus NIT)
|
||||
"ollama", # via custom_providers, common pattern
|
||||
"llamacpp", # via custom_providers
|
||||
"llama-cpp", # alias
|
||||
"vllm", # via custom_providers
|
||||
"tabby", # via custom_providers (TabbyAPI)
|
||||
"tabbyapi", # alias
|
||||
"koboldcpp", # local llama.cpp UI fork
|
||||
"textgen", # text-generation-webui (oobabooga) OpenAI-compat extension
|
||||
"localai", # LocalAI project (#1625 Opus NIT)
|
||||
}
|
||||
|
||||
|
||||
def _base_url_points_at_local_server(base_url: str) -> bool:
|
||||
"""True if base_url's host is a loopback or private IP (likely local server).
|
||||
|
||||
Reuses ipaddress.is_loopback / is_private / is_link_local — the same
|
||||
heuristic used in the `api/config.py` SSRF/credential-routing code.
|
||||
Errors (DNS failure, malformed URL) return False so callers fall back to
|
||||
the static-provider-name check.
|
||||
"""
|
||||
if not base_url:
|
||||
return False
|
||||
try:
|
||||
from urllib.parse import urlparse
|
||||
import ipaddress
|
||||
host = (urlparse(base_url).hostname or "").lower()
|
||||
if not host:
|
||||
return False
|
||||
# Plain-text "localhost" doesn't ipaddress-parse but is unambiguous.
|
||||
if host in ("localhost", "ip6-localhost", "ip6-loopback"):
|
||||
return True
|
||||
try:
|
||||
addr = ipaddress.ip_address(host)
|
||||
except ValueError:
|
||||
# Not an IP literal — could be a hostname like "ollama.internal".
|
||||
# Don't try DNS resolution here (slow + ambient): only IP literals
|
||||
# and the `localhost` alias get the no-strip treatment via this path.
|
||||
return False
|
||||
return addr.is_loopback or addr.is_private or addr.is_link_local
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
def resolve_model_provider(model_id: str) -> tuple:
|
||||
"""Resolve model name, provider, and base_url for AIAgent.
|
||||
|
||||
@@ -1250,6 +1316,15 @@ def resolve_model_provider(model_id: str) -> tuple:
|
||||
# just because the model name contains a slash (e.g. google/gemma-4-26b-a4b).
|
||||
# The user has explicitly pointed at a base_url, so trust their routing config.
|
||||
if config_base_url:
|
||||
# Local model servers (LM Studio, Ollama, llama.cpp, vLLM, TabbyAPI)
|
||||
# register models under their full HuggingFace-style id. Stripping the
|
||||
# prefix breaks the lookup and causes a fresh instance to load with
|
||||
# default settings, ignoring user-tuned context length / parallel slots.
|
||||
# See #1625. Detect either by canonical provider name OR by base_url
|
||||
# pointing at a loopback/private host.
|
||||
if (str(config_provider or "").lower() in _LOCAL_SERVER_PROVIDERS
|
||||
or _base_url_points_at_local_server(config_base_url)):
|
||||
return model_id, config_provider, config_base_url
|
||||
# Only strip the provider prefix when it's a known provider namespace
|
||||
# (e.g. "openai/gpt-5.4" → "gpt-5.4" for a custom OpenAI-compatible proxy).
|
||||
# Unknown prefixes (e.g. "zai-org/GLM-5.1" on DeepInfra) are intrinsic to
|
||||
@@ -1506,6 +1581,43 @@ _provider_models_invalidated_ts: dict[str, float] = {} # provider_id -> timesta
|
||||
# HERMES_WEBUI_STATE_DIR / port) has its own file and test runs never
|
||||
# pollute the production server's cache. Also works on macOS and Windows
|
||||
# where /dev/shm does not exist.
|
||||
def _current_webui_version() -> str | None:
|
||||
"""Lazy resolver for the WebUI version, used to stamp the disk cache (#1633).
|
||||
|
||||
`api.updates` imports `api.config` at module-load time, so we cannot
|
||||
`from api.updates import WEBUI_VERSION` at the top of this module without a
|
||||
circular import. Instead we resolve lazily on each cache load/save.
|
||||
|
||||
Returns the runtime version string (e.g. ``v0.50.293``) when api.updates
|
||||
has been imported, or None if it isn't loaded yet (boot-time corner case
|
||||
before the server has finished initializing). A None return is treated as
|
||||
"do not stamp / do not validate" by the cache layer so cache reads/writes
|
||||
that happen during early init still work — the next call after init will
|
||||
stamp normally.
|
||||
"""
|
||||
try:
|
||||
# Read attribute via dotted lookup so we don't add an import-time edge.
|
||||
import sys as _sys
|
||||
mod = _sys.modules.get('api.updates')
|
||||
if mod is None:
|
||||
return None
|
||||
v = getattr(mod, 'WEBUI_VERSION', None)
|
||||
return str(v) if v else None
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
|
||||
# Disk-cache schema version (#1633).
|
||||
#
|
||||
# Bumped any time the disk cache shape changes in a backward-incompatible way
|
||||
# (e.g. new required field, renamed key). Independent of the WebUI version
|
||||
# stamp — _webui_version forces a rebuild on every release; _schema_version
|
||||
# guarantees that even if a future release accidentally reuses the same
|
||||
# WebUI version string (or a debug build doesn't have a version), a structural
|
||||
# change still invalidates the cache.
|
||||
_MODELS_CACHE_SCHEMA_VERSION = 2
|
||||
|
||||
|
||||
_models_cache_path = STATE_DIR / "models_cache.json"
|
||||
|
||||
|
||||
@@ -1517,7 +1629,15 @@ def _delete_models_cache_on_disk() -> None:
|
||||
|
||||
|
||||
def _is_valid_models_cache(cache: object) -> bool:
|
||||
"""Return True when a disk cache payload has the full /api/models shape."""
|
||||
"""Return True when a cache payload has the full /api/models shape.
|
||||
|
||||
SHAPE-only check: validates structural correctness of an in-memory or
|
||||
on-disk cache. Use _is_loadable_disk_cache() for the strictness needed
|
||||
when reading from disk (it adds version-stamp invalidation per #1633).
|
||||
|
||||
Kept loose so in-memory cache writes (which never touch disk and so don't
|
||||
need version stamping) can use this validator unchanged.
|
||||
"""
|
||||
if not isinstance(cache, dict):
|
||||
return False
|
||||
if not {"active_provider", "default_model", "configured_model_badges", "groups"}.issubset(cache):
|
||||
@@ -1531,8 +1651,59 @@ def _is_valid_models_cache(cache: object) -> bool:
|
||||
)
|
||||
|
||||
|
||||
def _is_loadable_disk_cache(cache: object) -> bool:
|
||||
"""Return True when an on-disk cache is safe to use after a process boot.
|
||||
|
||||
Adds two checks on top of _is_valid_models_cache (#1633):
|
||||
1. ``_schema_version`` matches `_MODELS_CACHE_SCHEMA_VERSION`. A bumped
|
||||
schema version unconditionally invalidates older cache files.
|
||||
2. ``_webui_version`` matches the current runtime version. Forces a
|
||||
rebuild after every release so users see picker-shape fixes
|
||||
immediately, instead of waiting up to 24 hours for the TTL to expire.
|
||||
If the runtime version cannot be resolved (early-init edge case),
|
||||
skip this check rather than wedge the boot.
|
||||
|
||||
Note: ``_webui_version`` is a string equality check, not a semver compare —
|
||||
two debug builds with the same `WEBUI_VERSION` string but different actual
|
||||
code wouldn't invalidate via this axis. ``_schema_version`` is the
|
||||
independent invalidation axis for breaking changes that lack a tag bump;
|
||||
bump it whenever the cache shape changes incompatibly.
|
||||
"""
|
||||
if not _is_valid_models_cache(cache):
|
||||
return False
|
||||
if not isinstance(cache, dict): # appease type-narrowing — already guarded above
|
||||
return False
|
||||
cached_schema = cache.get("_schema_version")
|
||||
if cached_schema != _MODELS_CACHE_SCHEMA_VERSION:
|
||||
# DEBUG telemetry per stage-294 absorption: makes "why did my cache
|
||||
# rebuild" investigations one log-grep away.
|
||||
logger.debug(
|
||||
"models cache rejected: schema=%r vs runtime=%r",
|
||||
cached_schema, _MODELS_CACHE_SCHEMA_VERSION,
|
||||
)
|
||||
return False
|
||||
runtime_version = _current_webui_version()
|
||||
if runtime_version is not None:
|
||||
cached_version = cache.get("_webui_version")
|
||||
if not isinstance(cached_version, str) or cached_version != runtime_version:
|
||||
logger.debug(
|
||||
"models cache rejected: webui_version=%r vs runtime=%r",
|
||||
cached_version, runtime_version,
|
||||
)
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def _load_models_cache_from_disk() -> dict | None:
|
||||
"""Load /api/models cache from disk if it exists and has current metadata."""
|
||||
"""Load /api/models cache from disk if it exists and has current metadata.
|
||||
|
||||
Adds the per-release version check from #1633: a cache stamped with a
|
||||
different WebUI version is treated as missing, forcing a fresh rebuild
|
||||
that picks up any picker-shape fixes shipped in the new release. The
|
||||
returned dict is the SHAPE-only cache (without the `_webui_version` /
|
||||
`_schema_version` stamps) so callers don't have to know about the
|
||||
on-disk metadata fields.
|
||||
"""
|
||||
try:
|
||||
import json as _j
|
||||
|
||||
@@ -1540,28 +1711,52 @@ def _load_models_cache_from_disk() -> dict | None:
|
||||
return None
|
||||
with open(_models_cache_path, encoding="utf-8") as f:
|
||||
cache = _j.load(f)
|
||||
return cache if _is_valid_models_cache(cache) else None
|
||||
if not _is_loadable_disk_cache(cache):
|
||||
return None
|
||||
# Strip the disk-only metadata before returning, so the in-memory
|
||||
# cache shape stays exactly what the rest of the code expects.
|
||||
return {
|
||||
"active_provider": cache["active_provider"],
|
||||
"default_model": cache["default_model"],
|
||||
"configured_model_badges": cache["configured_model_badges"],
|
||||
"groups": cache["groups"],
|
||||
}
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
|
||||
def _save_models_cache_to_disk(cache: dict) -> None:
|
||||
"""Save cache to disk so it survives server restarts."""
|
||||
"""Save cache to disk so it survives server restarts.
|
||||
|
||||
Stamps the payload with `_webui_version` and `_schema_version` (#1633) so
|
||||
a subsequent process running a different WebUI version, or a future
|
||||
release that bumps the schema, will treat the file as invalid and
|
||||
rebuild from live provider data on its first /api/models call.
|
||||
|
||||
The version stamp is omitted (not the literal None — the field is just
|
||||
skipped) when the runtime version cannot be resolved at the moment of
|
||||
save, which would happen only in a very early boot path before
|
||||
api.updates is loaded. _is_loadable_disk_cache treats a missing field as
|
||||
a mismatch (since runtime_version is non-None on every subsequent call),
|
||||
so this is safe — at worst we write one cache file that gets rejected
|
||||
once on the next boot.
|
||||
"""
|
||||
try:
|
||||
if not _is_valid_models_cache(cache):
|
||||
return
|
||||
payload = {
|
||||
"_schema_version": _MODELS_CACHE_SCHEMA_VERSION,
|
||||
"active_provider": cache["active_provider"],
|
||||
"default_model": cache["default_model"],
|
||||
"configured_model_badges": cache["configured_model_badges"],
|
||||
"groups": cache["groups"],
|
||||
}
|
||||
runtime_version = _current_webui_version()
|
||||
if runtime_version is not None:
|
||||
payload["_webui_version"] = runtime_version
|
||||
tmp = str(_models_cache_path) + f".{os.getpid()}.tmp"
|
||||
with open(tmp, "w", encoding="utf-8") as f:
|
||||
json.dump(
|
||||
{
|
||||
"active_provider": cache["active_provider"],
|
||||
"default_model": cache["default_model"],
|
||||
"configured_model_badges": cache["configured_model_badges"],
|
||||
"groups": cache["groups"],
|
||||
},
|
||||
f,
|
||||
indent=2,
|
||||
)
|
||||
json.dump(payload, f, indent=2)
|
||||
os.rename(tmp, str(_models_cache_path))
|
||||
except Exception:
|
||||
pass # Non-fatal -- cache will rebuild on next call
|
||||
|
||||
+56
-1
@@ -713,11 +713,32 @@ def _apply_core_sync_or_error_marker(
|
||||
return True
|
||||
|
||||
|
||||
# ── _repair_stale_pending grace period (#1624) ─────────────────────────────
|
||||
#
|
||||
# Defense-in-depth against a narrow race between the streaming thread clearing
|
||||
# pending_user_message and STREAMS.pop(stream_id). Without this guard, any
|
||||
# fast turn (e.g. command approval) that exits the thread before the on-disk
|
||||
# pending clear has flushed gets misdiagnosed as a crashed turn, producing a
|
||||
# spurious "Previous turn did not complete." marker.
|
||||
#
|
||||
# 30s covers the worst-case post-loop persistence window: LLM finishing a tool
|
||||
# batch + lock contention with the checkpoint thread + a multi-MB session.save.
|
||||
# A legitimately crashed turn whose pending_started_at is < 30s old will not
|
||||
# repair on the first get_session() call, but WILL repair on the next call
|
||||
# after the grace period elapses (typically the user's next interaction).
|
||||
#
|
||||
# Missing/falsy pending_started_at (legacy sidecars from before that field
|
||||
# existed, or any path that forgot to set it) is treated as "old enough" so
|
||||
# repair still recovers them — preserves current behavior for legacy data.
|
||||
_REPAIR_STALE_PENDING_GRACE_SECONDS = 30
|
||||
|
||||
|
||||
def _repair_stale_pending(session) -> bool:
|
||||
"""Recover a sidecar stuck with messages=[] and stale pending state.
|
||||
|
||||
Fires only when messages is empty, pending_user_message is set,
|
||||
active_stream_id is set, and the stream is no longer alive.
|
||||
active_stream_id is set, the stream is no longer alive, AND the turn is
|
||||
older than _REPAIR_STALE_PENDING_GRACE_SECONDS (#1624).
|
||||
|
||||
Uses a non-blocking lock acquire so a caller that already holds the
|
||||
per-session lock (e.g. retry_last, undo_last, cancel_stream) cannot
|
||||
@@ -735,6 +756,26 @@ def _repair_stale_pending(session) -> bool:
|
||||
or _seen_stream_id in _active_stream_ids()):
|
||||
return False
|
||||
|
||||
# Grace-period guard: bail if the turn is too fresh to be a real crash.
|
||||
# Falsy pending_started_at (None, 0, missing) means "old enough" — preserve
|
||||
# legacy-data recovery semantics for sessions that pre-date the field.
|
||||
_started = getattr(session, 'pending_started_at', None)
|
||||
if _started:
|
||||
try:
|
||||
_age = time.time() - float(_started)
|
||||
except (TypeError, ValueError):
|
||||
_age = float('inf')
|
||||
if _age < _REPAIR_STALE_PENDING_GRACE_SECONDS:
|
||||
logger.debug(
|
||||
"_repair_stale_pending: skipping repair for session %s — "
|
||||
"pending_started_at age=%.1fs < %ds grace window",
|
||||
session.session_id, _age, _REPAIR_STALE_PENDING_GRACE_SECONDS,
|
||||
)
|
||||
return False
|
||||
else:
|
||||
# Treat missing/falsy pending_started_at as "old enough" (legacy data).
|
||||
_age = float('inf')
|
||||
|
||||
sid = session.session_id
|
||||
if not sid or not all(c in '0123456789abcdefghijklmnopqrstuvwxyz_' for c in sid):
|
||||
return False
|
||||
@@ -753,6 +794,20 @@ def _repair_stale_pending(session) -> bool:
|
||||
)
|
||||
return False
|
||||
try:
|
||||
# Telemetry (#1624): log legitimate repair firings so the next batch
|
||||
# of user reports tells us whether the underlying race still fires
|
||||
# post-fix. Rate-limit by age (Opus pre-release SHOULD-FIX): WARNING
|
||||
# for the diagnostically valuable race window (< 5 min — actual
|
||||
# leak-path candidates that slipped past the grace guard) and DEBUG
|
||||
# for the long-tail (orphaned sidecars from prior process lifetimes)
|
||||
# so reconnect loops on stuck sessions don't flood the log.
|
||||
_DIAG_WARN_WINDOW_SECONDS = 300 # 5 min
|
||||
_age_str = ('inf' if _age == float('inf') else f'{_age:.1f}s')
|
||||
_log = logger.warning if _age < _DIAG_WARN_WINDOW_SECONDS else logger.debug
|
||||
_log(
|
||||
"_repair_stale_pending firing: session=%s stream_id=%s pending_age=%s",
|
||||
sid, _seen_stream_id, _age_str,
|
||||
)
|
||||
return _apply_core_sync_or_error_marker(
|
||||
session, core_path, stream_id_for_recheck=_seen_stream_id,
|
||||
)
|
||||
|
||||
+19
-5
@@ -101,6 +101,20 @@ def _all_profiles_query_flag(parsed_url) -> bool:
|
||||
raw = qs.get('all_profiles', [''])[0].strip().lower()
|
||||
return raw in ('1', 'true', 'yes', 'on')
|
||||
|
||||
# ── SSE app-level heartbeat (#1623) ────────────────────────────────────────
|
||||
#
|
||||
# Kernel TCP keepalive (server.py setsockopt block) declares a peer dead at
|
||||
# KEEPIDLE (10s) + KEEPINTVL (5s) * KEEPCNT (3) = 25s in the worst case. The
|
||||
# app-level SSE heartbeat must fire well below that window so flaky-network
|
||||
# probes never get the chance to kill an idle stream during long LLM thinking
|
||||
# phases. 5s gives the kernel ~5x headroom: probe at 10s, heartbeat byte at
|
||||
# every 5s of idle keeps the socket warm.
|
||||
#
|
||||
# Cost: ~12 bytes per heartbeat * 12 extra heartbeats/min = ~150B/min idle.
|
||||
# Trivial; many production SSE deployments run 5-15s heartbeats specifically
|
||||
# to handle proxies and mobile NAT.
|
||||
_SSE_HEARTBEAT_INTERVAL_SECONDS = 5
|
||||
|
||||
|
||||
def _normalize_messaging_source(raw_source) -> str:
|
||||
return str(raw_source or "").strip().lower()
|
||||
@@ -3778,7 +3792,7 @@ def _handle_sse_stream(handler, parsed):
|
||||
try:
|
||||
while True:
|
||||
try:
|
||||
event, data = subscriber.get(timeout=30)
|
||||
event, data = subscriber.get(timeout=_SSE_HEARTBEAT_INTERVAL_SECONDS)
|
||||
except queue.Empty:
|
||||
handler.wfile.write(b": heartbeat\n\n")
|
||||
handler.wfile.flush()
|
||||
@@ -3901,7 +3915,7 @@ def _handle_terminal_output(handler, parsed):
|
||||
try:
|
||||
while True:
|
||||
try:
|
||||
event, data = term.output.get(timeout=25)
|
||||
event, data = term.output.get(timeout=_SSE_HEARTBEAT_INTERVAL_SECONDS)
|
||||
except queue.Empty:
|
||||
handler.wfile.write(b": terminal heartbeat\n\n")
|
||||
handler.wfile.flush()
|
||||
@@ -3986,7 +4000,7 @@ def _handle_gateway_sse_stream(handler, parsed):
|
||||
|
||||
while True:
|
||||
try:
|
||||
event_data = q.get(timeout=30)
|
||||
event_data = q.get(timeout=_SSE_HEARTBEAT_INTERVAL_SECONDS)
|
||||
except queue.Empty:
|
||||
handler.wfile.write(b': keepalive\n\n')
|
||||
handler.wfile.flush()
|
||||
@@ -4309,7 +4323,7 @@ def _handle_approval_sse_stream(handler, parsed):
|
||||
try:
|
||||
while True:
|
||||
try:
|
||||
payload = q.get(timeout=30)
|
||||
payload = q.get(timeout=_SSE_HEARTBEAT_INTERVAL_SECONDS)
|
||||
except queue.Empty:
|
||||
# Keepalive — SSE comment line prevents proxy/CDN timeout.
|
||||
handler.wfile.write(b': keepalive\n\n')
|
||||
@@ -4410,7 +4424,7 @@ def _handle_clarify_sse_stream(handler, parsed):
|
||||
try:
|
||||
while True:
|
||||
try:
|
||||
payload = q.get(timeout=30)
|
||||
payload = q.get(timeout=_SSE_HEARTBEAT_INTERVAL_SECONDS)
|
||||
except queue.Empty:
|
||||
handler.wfile.write(b': keepalive\n\n')
|
||||
handler.wfile.flush()
|
||||
|
||||
+18
-2
@@ -188,8 +188,24 @@ if [ "A${whoami}" == "Ahermeswebuitoo" ]; then
|
||||
# We are altering the UID/GID of the hermeswebui user to the desired ones and restarting as that user
|
||||
# using usermod for the already create hermeswebui user, knowing it is not already in use
|
||||
# per usermod manual: "You must make certain that the named user is not executing any processes when this command is being executed"
|
||||
sudo groupmod -o -g ${WANTED_GID} hermeswebui || error_exit "Failed to set GID of hermeswebui user"
|
||||
sudo usermod -o -u ${WANTED_UID} hermeswebui || error_exit "Failed to set UID of hermeswebui user"
|
||||
# Guard for read-only root filesystem (podman with read_only=true, issue #1470).
|
||||
_readonly_root=false
|
||||
if [ ! -w /etc/group ] || [ ! -w /etc/passwd ]; then
|
||||
_readonly_root=true
|
||||
echo " !! Detected read-only root filesystem — /etc/group or /etc/passwd is not writable"
|
||||
fi
|
||||
if [ "A${_readonly_root}" == "Atrue" ]; then
|
||||
_current_hermeswebui_gid=$(id -g hermeswebui 2>/dev/null || echo "")
|
||||
_current_hermeswebui_uid=$(id -u hermeswebui 2>/dev/null || echo "")
|
||||
if [ "A${_current_hermeswebui_gid}" == "A${WANTED_GID}" ] && [ "A${_current_hermeswebui_uid}" == "A${WANTED_UID}" ]; then
|
||||
echo " -- Skipping groupmod/usermod — hermeswebui already has UID ${WANTED_UID} GID ${WANTED_GID} and root fs is read-only"
|
||||
else
|
||||
error_exit "Cannot modify /etc/group or /etc/passwd (read-only root fs). Set UID=${_current_hermeswebui_uid} and GID=${_current_hermeswebui_gid} to match, or run without read_only=true. See issue #1470."
|
||||
fi
|
||||
else
|
||||
sudo groupmod -o -g ${WANTED_GID} hermeswebui || error_exit "Failed to set GID of hermeswebui user"
|
||||
sudo usermod -o -u ${WANTED_UID} hermeswebui || error_exit "Failed to set UID of hermeswebui user"
|
||||
fi
|
||||
sudo chown -R ${WANTED_UID}:${WANTED_GID} /home/hermeswebui || error_exit "Failed to set owner of /home/hermeswebui"
|
||||
save_env /tmp/hermeswebuitoo_env.txt
|
||||
# restart the script as hermeswebui set with the correct UID/GID this time
|
||||
|
||||
@@ -1359,7 +1359,14 @@ window.addEventListener('resize',()=>{
|
||||
if(_sessionActionMenu && _sessionActionAnchor) _positionSessionActionMenu(_sessionActionAnchor);
|
||||
});
|
||||
|
||||
// Generation counter to discard stale API responses (issue #1430).
|
||||
// Multiple callers (message send, rename, session switch) fire renderSessionList()
|
||||
// concurrently. Without this guard, a slower older response can overwrite _allSessions
|
||||
// with stale data, causing sessions to vanish from the sidebar.
|
||||
let _renderSessionListGen = 0;
|
||||
|
||||
async function renderSessionList(){
|
||||
const _gen = ++_renderSessionListGen;
|
||||
try{
|
||||
if(!($('sessionSearch').value||'').trim()) _contentSearchResults = [];
|
||||
const allProfilesQS = _showAllProfiles ? '?all_profiles=1' : '';
|
||||
@@ -1367,6 +1374,8 @@ async function renderSessionList(){
|
||||
api('/api/sessions' + allProfilesQS),
|
||||
api('/api/projects' + allProfilesQS),
|
||||
]);
|
||||
// Discard stale response — a newer renderSessionList() call superseded us.
|
||||
if (_gen !== _renderSessionListGen) return;
|
||||
// Server's other_profile_count tells us how many sessions exist outside the
|
||||
// active profile so the "Show N from other profiles" toggle can render
|
||||
// without a second round-trip. Stashed on the module for renderSessionListFromCache.
|
||||
|
||||
@@ -0,0 +1,89 @@
|
||||
"""Tests for #1623: SSE app heartbeat must fire well under the kernel keepalive timeout.
|
||||
|
||||
Bug shape: server.py's per-connection TCP keepalive (added v0.50.289 / #1581)
|
||||
declares a peer dead at KEEPIDLE=10s + KEEPINTVL=5s * KEEPCNT=3 = 25s. The
|
||||
SSE handlers in api/routes.py used a 30s app-level heartbeat. When the LLM
|
||||
is thinking and the queue is idle, the kernel could tear down the socket
|
||||
before the app sent its first heartbeat byte — flaky-network drops at ~10s
|
||||
that the user perceived as "the stream died around 10 seconds in."
|
||||
|
||||
Fix: lower the heartbeat to 5s at every SSE handler and pin the inequality
|
||||
with a regression test so future tuning of either timer can't re-introduce
|
||||
the misalignment.
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
REPO = Path(__file__).parent.parent
|
||||
|
||||
|
||||
def test_sse_heartbeat_constant_below_kernel_keepalive_window():
|
||||
"""The named constant exists and is at most half the kernel keepalive
|
||||
timeout (10 + 5*3 = 25s). 5s gives the kernel ~5x headroom."""
|
||||
src = (REPO / "api" / "routes.py").read_text(encoding="utf-8")
|
||||
|
||||
# The constant must be defined.
|
||||
assert "_SSE_HEARTBEAT_INTERVAL_SECONDS" in src, (
|
||||
"Named SSE heartbeat constant must exist (#1623)"
|
||||
)
|
||||
|
||||
# Pull the literal value.
|
||||
import re
|
||||
m = re.search(r"_SSE_HEARTBEAT_INTERVAL_SECONDS\s*=\s*(\d+)", src)
|
||||
assert m, "Could not parse _SSE_HEARTBEAT_INTERVAL_SECONDS literal"
|
||||
heartbeat = int(m.group(1))
|
||||
|
||||
# Reproduce the kernel-keepalive window from server.py setsockopt block.
|
||||
server_src = (REPO / "server.py").read_text(encoding="utf-8")
|
||||
assert "TCP_KEEPIDLE" in server_src, "TCP_KEEPIDLE must be set on accepted connections"
|
||||
keepidle = int(re.search(r"TCP_KEEPIDLE.*?(\d+)\)", server_src, re.S).group(1))
|
||||
keepintvl = int(re.search(r"TCP_KEEPINTVL.*?(\d+)\)", server_src, re.S).group(1))
|
||||
keepcnt = int(re.search(r"TCP_KEEPCNT.*?(\d+)\)", server_src, re.S).group(1))
|
||||
kernel_window = keepidle + keepintvl * keepcnt
|
||||
|
||||
# The acceptance criterion from the bug: app heartbeat <= kernel window / 2.
|
||||
assert heartbeat * 2 <= kernel_window, (
|
||||
f"App SSE heartbeat ({heartbeat}s) must be at most half of the kernel "
|
||||
f"keepalive window ({kernel_window}s = {keepidle} + {keepintvl}*{keepcnt}). "
|
||||
f"Otherwise flaky-network probes can tear down the socket before the "
|
||||
f"app sends a heartbeat byte. (#1623)"
|
||||
)
|
||||
|
||||
|
||||
def test_no_sse_handler_uses_30s_or_higher_timeout():
|
||||
"""No SSE/long-poll handler in routes.py should still be using the old
|
||||
30s/25s timeout. Every queue.get(timeout=...) call inside an SSE handler
|
||||
must reference the named constant, not a hard-coded number."""
|
||||
src = (REPO / "api" / "routes.py").read_text(encoding="utf-8")
|
||||
|
||||
import re
|
||||
# Catch q.get(timeout=30), subscriber.get(timeout=30), term.output.get(timeout=25), etc.
|
||||
bad = re.findall(r"\.get\(timeout=3[05]\)", src)
|
||||
assert not bad, (
|
||||
f"Found {len(bad)} SSE handler call(s) still using a 25/30s timeout: {bad}. "
|
||||
"All should use _SSE_HEARTBEAT_INTERVAL_SECONDS (#1623)."
|
||||
)
|
||||
|
||||
|
||||
def test_each_named_sse_handler_uses_constant():
|
||||
"""Each known SSE handler queue-poll site must reference the constant."""
|
||||
src = (REPO / "api" / "routes.py").read_text(encoding="utf-8")
|
||||
|
||||
expected_callers = [
|
||||
"subscriber.get(timeout=_SSE_HEARTBEAT_INTERVAL_SECONDS)", # main agent SSE
|
||||
"term.output.get(timeout=_SSE_HEARTBEAT_INTERVAL_SECONDS)", # terminal SSE
|
||||
]
|
||||
for caller in expected_callers:
|
||||
assert caller in src, (
|
||||
f"Expected SSE handler to call {caller!r} (#1623). "
|
||||
"If this assertion fails, the SSE heartbeat misalignment may have regressed."
|
||||
)
|
||||
|
||||
# Also: at least 3 sites should be using the constant overall (main agent,
|
||||
# terminal, plus the gateway watcher and approval/clarify pollers).
|
||||
n_uses = src.count("get(timeout=_SSE_HEARTBEAT_INTERVAL_SECONDS)")
|
||||
assert n_uses >= 4, (
|
||||
f"Expected at least 4 SSE/long-poll sites using the named constant; found {n_uses}. "
|
||||
"Every long-lived idle queue poll must align below the kernel keepalive window."
|
||||
)
|
||||
@@ -0,0 +1,160 @@
|
||||
"""Tests for #1624: _repair_stale_pending must not fire on fresh turns.
|
||||
|
||||
Bug shape: _repair_stale_pending() fires whenever pending_user_message is set
|
||||
and the active_stream_id is not in the live STREAMS registry. There's a
|
||||
narrow race between the streaming thread clearing pending_user_message and
|
||||
STREAMS.pop(stream_id), so any fast turn (e.g. command approval) that exits
|
||||
the thread before the on-disk pending clear flushes gets misdiagnosed as a
|
||||
crashed turn — producing a spurious "Previous turn did not complete." marker.
|
||||
|
||||
Fix: add a grace-period guard. A turn whose pending_started_at is younger
|
||||
than _REPAIR_STALE_PENDING_GRACE_SECONDS is treated as "the streaming thread
|
||||
may still be in its post-loop cleanup window" and the repair bails. Missing
|
||||
or falsy pending_started_at (legacy sidecars that pre-date the field) is
|
||||
treated as "old enough" to preserve current legacy-data recovery semantics.
|
||||
"""
|
||||
|
||||
import time
|
||||
import threading
|
||||
from unittest.mock import patch
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
# ── _repair_stale_pending grace guard ───────────────────────────────────
|
||||
|
||||
|
||||
class _FakeSession:
|
||||
"""Minimal stand-in for api.models.Session — only the fields _repair_stale_pending reads."""
|
||||
def __init__(self, sid="abcdef123456", pending="hi", stream_id="stream_xyz",
|
||||
pending_started_at=None, profile="default"):
|
||||
self.session_id = sid
|
||||
self.pending_user_message = pending
|
||||
self.active_stream_id = stream_id
|
||||
self.pending_started_at = pending_started_at
|
||||
self.profile = profile
|
||||
self.messages = []
|
||||
|
||||
|
||||
def _setup_repair_environment(monkeypatch, tmp_path):
|
||||
"""Stub out the costly side-channels in _repair_stale_pending so the
|
||||
tests exercise the guard logic alone, not the full lock+sidecar pipeline."""
|
||||
import api.models as models
|
||||
|
||||
# No live streams — the predicate's "stream not in registry" branch fires.
|
||||
monkeypatch.setattr(models, "_active_stream_ids", lambda: set())
|
||||
|
||||
# Profile home -> tmp dir; sessions/<sid>.json doesn't need to exist
|
||||
# because we'll stub _apply_core_sync_or_error_marker.
|
||||
monkeypatch.setattr(models, "_get_profile_home", lambda profile: tmp_path)
|
||||
(tmp_path / "sessions").mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Track whether the heavy-lift function was called so we can assert.
|
||||
calls = {"applied": 0}
|
||||
|
||||
def fake_apply(session, core_path, **kw):
|
||||
calls["applied"] += 1
|
||||
return True
|
||||
monkeypatch.setattr(models, "_apply_core_sync_or_error_marker", fake_apply)
|
||||
|
||||
return calls
|
||||
|
||||
|
||||
def test_repair_skips_fresh_turn(tmp_path, monkeypatch):
|
||||
"""A turn that started 5 seconds ago is too fresh — repair must bail."""
|
||||
import api.models as models
|
||||
calls = _setup_repair_environment(monkeypatch, tmp_path)
|
||||
|
||||
s = _FakeSession(pending_started_at=time.time() - 5.0)
|
||||
result = models._repair_stale_pending(s)
|
||||
assert result is False, "Repair must skip a 5s-old turn"
|
||||
assert calls["applied"] == 0, "Heavy-lift _apply_core_sync_or_error_marker must not be called"
|
||||
|
||||
|
||||
def test_repair_skips_almost_grace_window(tmp_path, monkeypatch):
|
||||
"""A turn 1 second younger than the grace threshold must still bail."""
|
||||
import api.models as models
|
||||
calls = _setup_repair_environment(monkeypatch, tmp_path)
|
||||
grace = models._REPAIR_STALE_PENDING_GRACE_SECONDS
|
||||
|
||||
s = _FakeSession(pending_started_at=time.time() - (grace - 1.0))
|
||||
result = models._repair_stale_pending(s)
|
||||
assert result is False, f"Repair must skip a turn {grace - 1}s old"
|
||||
assert calls["applied"] == 0
|
||||
|
||||
|
||||
def test_repair_fires_after_grace_window(tmp_path, monkeypatch):
|
||||
"""A turn older than the grace window should trigger repair as before."""
|
||||
import api.models as models
|
||||
calls = _setup_repair_environment(monkeypatch, tmp_path)
|
||||
grace = models._REPAIR_STALE_PENDING_GRACE_SECONDS
|
||||
|
||||
s = _FakeSession(pending_started_at=time.time() - (grace + 30.0))
|
||||
result = models._repair_stale_pending(s)
|
||||
assert result is True, f"Repair must fire on a turn older than {grace}s"
|
||||
assert calls["applied"] == 1, "Heavy-lift _apply_core_sync_or_error_marker should be called"
|
||||
|
||||
|
||||
def test_repair_fires_when_pending_started_at_missing(tmp_path, monkeypatch):
|
||||
"""Legacy sidecars predate `pending_started_at`; missing/falsy must NOT
|
||||
block repair — preserves current behavior for legacy data."""
|
||||
import api.models as models
|
||||
calls = _setup_repair_environment(monkeypatch, tmp_path)
|
||||
|
||||
s = _FakeSession(pending_started_at=None)
|
||||
result = models._repair_stale_pending(s)
|
||||
assert result is True, "Missing pending_started_at must not block legitimate repair"
|
||||
assert calls["applied"] == 1
|
||||
|
||||
|
||||
def test_repair_fires_when_pending_started_at_zero(tmp_path, monkeypatch):
|
||||
"""Falsy 0 must also be treated as 'old enough' (defense against accidental zeroing)."""
|
||||
import api.models as models
|
||||
calls = _setup_repair_environment(monkeypatch, tmp_path)
|
||||
|
||||
s = _FakeSession(pending_started_at=0)
|
||||
result = models._repair_stale_pending(s)
|
||||
assert result is True, "pending_started_at=0 must not block legitimate repair"
|
||||
|
||||
|
||||
def test_repair_fires_when_pending_started_at_garbage(tmp_path, monkeypatch):
|
||||
"""Garbage values (string, dict, etc.) shouldn't crash and shouldn't block repair."""
|
||||
import api.models as models
|
||||
calls = _setup_repair_environment(monkeypatch, tmp_path)
|
||||
|
||||
s = _FakeSession(pending_started_at="not-a-number")
|
||||
result = models._repair_stale_pending(s)
|
||||
assert result is True, "Garbage pending_started_at should be treated as 'old enough'"
|
||||
|
||||
|
||||
def test_repair_skips_when_no_pending_message(tmp_path, monkeypatch):
|
||||
"""Without pending_user_message, repair must always bail (existing contract)."""
|
||||
import api.models as models
|
||||
calls = _setup_repair_environment(monkeypatch, tmp_path)
|
||||
|
||||
s = _FakeSession(pending="", pending_started_at=time.time() - 60)
|
||||
result = models._repair_stale_pending(s)
|
||||
assert result is False
|
||||
assert calls["applied"] == 0
|
||||
|
||||
|
||||
def test_repair_skips_when_stream_still_alive(tmp_path, monkeypatch):
|
||||
"""If the stream is still in the registry, repair must bail even past grace."""
|
||||
import api.models as models
|
||||
monkeypatch.setattr(models, "_active_stream_ids", lambda: {"stream_xyz"})
|
||||
monkeypatch.setattr(models, "_get_profile_home", lambda profile: tmp_path)
|
||||
|
||||
s = _FakeSession(pending_started_at=time.time() - 600)
|
||||
result = models._repair_stale_pending(s)
|
||||
assert result is False, "Stream-alive bail predates the grace guard"
|
||||
|
||||
|
||||
def test_grace_constant_exists_and_is_sane():
|
||||
"""The grace constant is exposed and sized in a sane range (10s..120s)."""
|
||||
import api.models as models
|
||||
grace = models._REPAIR_STALE_PENDING_GRACE_SECONDS
|
||||
assert isinstance(grace, (int, float))
|
||||
assert 10 <= grace <= 120, (
|
||||
f"Grace window {grace}s should be 10s-120s — too small re-introduces "
|
||||
"the false-positive race; too large delays legitimate recovery."
|
||||
)
|
||||
@@ -0,0 +1,149 @@
|
||||
"""Tests for #1625: resolve_model_provider must NOT strip provider prefix on local servers.
|
||||
|
||||
Bug shape: with `model.provider: lmstudio`, `model.base_url: http://localhost:1234/v1`,
|
||||
`model.default: qwen/qwen3.6-27b`, resolve_model_provider() stripped the
|
||||
"qwen/" prefix because "qwen" matches an entry in _PROVIDER_MODELS — sending
|
||||
the request to LM Studio with model name "qwen3.6-27b". LM Studio (and Ollama,
|
||||
llama.cpp, vLLM, TabbyAPI) register models under their full HuggingFace-style
|
||||
id, so the stripped name didn't match the loaded model and a fresh instance
|
||||
loaded with default settings, ignoring the user's tuned context length /
|
||||
parallel slots.
|
||||
|
||||
Fix: explicit no-strip path for known local-server providers AND any
|
||||
base_url pointing at a loopback/private host. OpenAI-compatible proxies
|
||||
(LiteLLM, OpenRouter relays) on public URLs continue to get prefix-stripping.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
|
||||
from api import config as cfg_mod
|
||||
|
||||
|
||||
# ── Helpers ───────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _patch_cfg(monkeypatch, **model_overrides):
|
||||
"""Patch api.config.cfg to a synthetic config dict for the duration of a test."""
|
||||
fake_cfg = {"model": dict(model_overrides), "custom_providers": []}
|
||||
monkeypatch.setattr(cfg_mod, "cfg", fake_cfg)
|
||||
|
||||
|
||||
# ── Local-server providers preserve full model id ──────────────────────────
|
||||
|
||||
|
||||
@pytest.mark.parametrize("provider_name", [
|
||||
"lmstudio",
|
||||
"lm-studio", # Opus pre-release NIT
|
||||
"ollama",
|
||||
"llamacpp",
|
||||
"llama-cpp",
|
||||
"vllm",
|
||||
"tabby",
|
||||
"tabbyapi",
|
||||
"koboldcpp",
|
||||
"textgen",
|
||||
"localai", # Opus pre-release NIT
|
||||
])
|
||||
def test_known_local_server_provider_preserves_full_model_id(provider_name, monkeypatch):
|
||||
"""Known local-server provider names must preserve the slashed model id
|
||||
even when the prefix matches _PROVIDER_MODELS."""
|
||||
_patch_cfg(monkeypatch, provider=provider_name, base_url="http://localhost:1234/v1")
|
||||
model, provider, base_url = cfg_mod.resolve_model_provider("qwen/qwen3.6-27b")
|
||||
assert model == "qwen/qwen3.6-27b", (
|
||||
f"Local-server provider {provider_name!r} must preserve the full model id; "
|
||||
f"stripping it makes LM Studio/Ollama/etc. load a fresh instance with "
|
||||
f"default settings (#1625)."
|
||||
)
|
||||
assert provider == provider_name
|
||||
assert base_url == "http://localhost:1234/v1"
|
||||
|
||||
|
||||
def test_lmstudio_with_huggingface_namespace_preserved(monkeypatch):
|
||||
"""The reporter's exact case: lmstudio + qwen/qwen3.6-27b + localhost."""
|
||||
_patch_cfg(monkeypatch, provider="lmstudio", base_url="http://localhost:1234/v1",
|
||||
default="qwen/qwen3.6-27b")
|
||||
model, provider, base_url = cfg_mod.resolve_model_provider("qwen/qwen3.6-27b")
|
||||
assert model == "qwen/qwen3.6-27b"
|
||||
|
||||
|
||||
def test_lmstudio_with_openai_prefix_preserved(monkeypatch):
|
||||
"""openai/gpt-oss-120b on LM Studio is a real HuggingFace id; the namespace
|
||||
is part of the registry key. Must not be stripped on local servers."""
|
||||
_patch_cfg(monkeypatch, provider="lmstudio", base_url="http://localhost:1234/v1")
|
||||
model, provider, base_url = cfg_mod.resolve_model_provider("openai/gpt-oss-120b")
|
||||
assert model == "openai/gpt-oss-120b", (
|
||||
"openai/gpt-oss-120b on LM Studio must preserve the full id (#1625)"
|
||||
)
|
||||
|
||||
|
||||
# ── Loopback / private-IP heuristic ───────────────────────────────────────
|
||||
|
||||
|
||||
@pytest.mark.parametrize("loopback_url", [
|
||||
"http://localhost:11434",
|
||||
"http://127.0.0.1:1234/v1",
|
||||
"http://127.0.0.1:8080/openai",
|
||||
"http://10.0.0.5:8080/v1", # private RFC1918
|
||||
"http://192.168.1.50:1234/v1", # private RFC1918
|
||||
"http://172.16.0.10:8000/v1", # private RFC1918
|
||||
"http://[::1]:1234/v1", # IPv6 loopback
|
||||
])
|
||||
def test_loopback_base_url_preserves_full_model_id(loopback_url, monkeypatch):
|
||||
"""Even with a generic `provider: custom` (or any non-local-server name),
|
||||
a base_url pointing at a loopback or private IP must preserve the model id —
|
||||
almost certainly a local model server."""
|
||||
_patch_cfg(monkeypatch, provider="custom", base_url=loopback_url)
|
||||
model, _, _ = cfg_mod.resolve_model_provider("qwen/qwen3.6-27b")
|
||||
assert model == "qwen/qwen3.6-27b", (
|
||||
f"Loopback/private base_url {loopback_url!r} must preserve the full model id (#1625)"
|
||||
)
|
||||
|
||||
|
||||
# ── Backward compat: OpenAI-compatible proxies keep prefix-stripping ─────
|
||||
|
||||
|
||||
def test_public_openai_proxy_still_strips_prefix(monkeypatch):
|
||||
"""OpenAI-compatible proxies (LiteLLM, public OpenRouter relays) still get
|
||||
the strip behavior so 'openai/gpt-5.4' → 'gpt-5.4'."""
|
||||
_patch_cfg(monkeypatch, provider="openai", base_url="https://litellm.example.com/v1")
|
||||
model, provider, base_url = cfg_mod.resolve_model_provider("openai/gpt-5.4")
|
||||
assert model == "gpt-5.4", (
|
||||
"Public-host openai/* on a non-loopback proxy must continue to strip prefix"
|
||||
)
|
||||
|
||||
|
||||
def test_unknown_prefix_on_public_proxy_preserved(monkeypatch):
|
||||
"""Unknown prefix (zai-org/GLM-5.1) on public proxy passes through full
|
||||
(the existing contract — stripping unknown prefixes caused model_not_found)."""
|
||||
_patch_cfg(monkeypatch, provider="openai", base_url="https://litellm.example.com/v1")
|
||||
model, _, _ = cfg_mod.resolve_model_provider("zai-org/GLM-5.1")
|
||||
assert model == "zai-org/GLM-5.1"
|
||||
|
||||
|
||||
def test_openrouter_passes_full_unaffected(monkeypatch):
|
||||
"""OpenRouter always needs the full provider/model path — pre-existing
|
||||
contract that the local-server fix must not disturb."""
|
||||
_patch_cfg(monkeypatch, provider="openrouter")
|
||||
model, provider, _ = cfg_mod.resolve_model_provider("anthropic/claude-sonnet-4.6")
|
||||
assert model == "anthropic/claude-sonnet-4.6"
|
||||
assert provider == "openrouter"
|
||||
|
||||
|
||||
# ── Helper unit tests ─────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@pytest.mark.parametrize("url, expected", [
|
||||
("http://localhost:1234", True),
|
||||
("http://127.0.0.1:1234", True),
|
||||
("http://10.0.0.5", True),
|
||||
("http://192.168.1.1:8080", True),
|
||||
("http://[::1]:1234", True),
|
||||
("http://example.com", False),
|
||||
("https://api.openai.com/v1", False),
|
||||
("https://litellm.example.com/v1", False),
|
||||
("", False),
|
||||
(None, False),
|
||||
("not-a-url", False),
|
||||
])
|
||||
def test_base_url_points_at_local_server_helper(url, expected):
|
||||
assert cfg_mod._base_url_points_at_local_server(url) is expected
|
||||
@@ -0,0 +1,390 @@
|
||||
"""Tests for #1633: /api/models disk cache must be invalidated on WebUI version change.
|
||||
|
||||
Bug shape: STATE_DIR/models_cache.json was persisted across server restarts
|
||||
without any version stamp. A Docker container update from version A to B
|
||||
read the cache file written by version A — users saw stale picker contents
|
||||
(missing models, phantom provider groups, etc.) for up to 24h until either
|
||||
(a) the TTL expired, (b) a provider edit triggered invalidate_models_cache,
|
||||
or (c) they manually deleted the file.
|
||||
|
||||
Fix: stamp the disk cache with the current WEBUI_VERSION + a schema version,
|
||||
and reject loads where either field mismatches. A new release auto-rebuilds
|
||||
the cache on the very next /api/models call instead of lingering for 24h.
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
# ── Fixtures ──────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def isolated_cache(tmp_path, monkeypatch):
|
||||
"""Redirect the disk cache to a tmp file and reset api.updates between tests."""
|
||||
from api import config
|
||||
|
||||
cache_path = tmp_path / "models_cache.json"
|
||||
monkeypatch.setattr(config, "_models_cache_path", cache_path)
|
||||
yield cache_path
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def with_runtime_version():
|
||||
"""Return a setter that forces a particular runtime WEBUI_VERSION."""
|
||||
# api.updates must be loaded for the lazy resolver to find it
|
||||
import api.updates as upd
|
||||
original = upd.WEBUI_VERSION
|
||||
|
||||
def _set(version: str):
|
||||
upd.WEBUI_VERSION = version
|
||||
|
||||
yield _set
|
||||
upd.WEBUI_VERSION = original
|
||||
|
||||
|
||||
def _shape_cache():
|
||||
"""Minimal valid cache shape (no version stamps — those are added on save)."""
|
||||
return {
|
||||
"active_provider": "anthropic",
|
||||
"default_model": "claude-sonnet-4.6",
|
||||
"configured_model_badges": {"foo": "bar"},
|
||||
"groups": [{"name": "Anthropic", "models": ["claude-sonnet-4.6"]}],
|
||||
}
|
||||
|
||||
|
||||
# ── _current_webui_version lazy resolver ──────────────────────────────────
|
||||
|
||||
|
||||
def test_current_webui_version_returns_runtime_version(with_runtime_version):
|
||||
"""When api.updates is loaded, the lazy resolver returns its WEBUI_VERSION."""
|
||||
from api.config import _current_webui_version
|
||||
with_runtime_version("v0.50.999-test")
|
||||
assert _current_webui_version() == "v0.50.999-test"
|
||||
|
||||
|
||||
def test_current_webui_version_returns_none_when_module_missing(monkeypatch):
|
||||
"""Early-init path: if api.updates isn't in sys.modules, return None.
|
||||
|
||||
Required so cache reads/writes during very early server boot don't wedge
|
||||
the startup sequence on AttributeError.
|
||||
"""
|
||||
monkeypatch.delitem(sys.modules, "api.updates", raising=False)
|
||||
from api.config import _current_webui_version
|
||||
assert _current_webui_version() is None
|
||||
|
||||
|
||||
# ── Disk cache version stamping ──────────────────────────────────────────
|
||||
|
||||
|
||||
def test_save_stamps_webui_version_on_disk(isolated_cache, with_runtime_version):
|
||||
"""Saving a cache writes both _webui_version and _schema_version stamps."""
|
||||
from api import config
|
||||
|
||||
with_runtime_version("v0.50.293")
|
||||
config._save_models_cache_to_disk(_shape_cache())
|
||||
|
||||
on_disk = json.load(open(isolated_cache))
|
||||
assert on_disk["_webui_version"] == "v0.50.293"
|
||||
assert on_disk["_schema_version"] == config._MODELS_CACHE_SCHEMA_VERSION
|
||||
|
||||
|
||||
def test_save_omits_webui_version_when_runtime_unknown(isolated_cache, monkeypatch):
|
||||
"""If api.updates isn't loaded (very early boot), save still works but
|
||||
skips the version stamp. The next load with a known runtime version will
|
||||
treat the file as invalid (fail-safe rebuild on first real call)."""
|
||||
monkeypatch.delitem(sys.modules, "api.updates", raising=False)
|
||||
from api import config
|
||||
|
||||
config._save_models_cache_to_disk(_shape_cache())
|
||||
on_disk = json.load(open(isolated_cache))
|
||||
assert "_webui_version" not in on_disk
|
||||
# Schema version is always written — it doesn't depend on api.updates
|
||||
assert on_disk["_schema_version"] == config._MODELS_CACHE_SCHEMA_VERSION
|
||||
|
||||
|
||||
def test_save_only_writes_known_keys(isolated_cache, with_runtime_version):
|
||||
"""Defensive — extra junk in the cache dict shouldn't leak to disk."""
|
||||
from api import config
|
||||
with_runtime_version("v0.50.999")
|
||||
|
||||
cache = _shape_cache()
|
||||
cache["secret_credentials"] = "definitely should not be on disk"
|
||||
cache["__internal_hint"] = "also nope"
|
||||
config._save_models_cache_to_disk(cache)
|
||||
|
||||
on_disk = json.load(open(isolated_cache))
|
||||
assert "secret_credentials" not in on_disk
|
||||
assert "__internal_hint" not in on_disk
|
||||
|
||||
|
||||
# ── Load: version validation ──────────────────────────────────────────────
|
||||
|
||||
|
||||
def test_load_round_trip_matching_version(isolated_cache, with_runtime_version):
|
||||
"""Save then load with the same runtime version returns the original shape."""
|
||||
from api import config
|
||||
|
||||
with_runtime_version("v0.50.293")
|
||||
original = _shape_cache()
|
||||
config._save_models_cache_to_disk(original)
|
||||
|
||||
loaded = config._load_models_cache_from_disk()
|
||||
assert loaded is not None
|
||||
# Shape preserved
|
||||
assert loaded["active_provider"] == original["active_provider"]
|
||||
assert loaded["default_model"] == original["default_model"]
|
||||
assert loaded["configured_model_badges"] == original["configured_model_badges"]
|
||||
assert loaded["groups"] == original["groups"]
|
||||
# Disk-only metadata stripped before return
|
||||
assert "_webui_version" not in loaded
|
||||
assert "_schema_version" not in loaded
|
||||
|
||||
|
||||
def test_load_rejects_mismatched_webui_version(isolated_cache, with_runtime_version):
|
||||
"""The core #1633 fix: a cache stamped v0.50.281 is invalid at runtime v0.50.293."""
|
||||
from api import config
|
||||
|
||||
# Save under v0.50.281
|
||||
with_runtime_version("v0.50.281")
|
||||
config._save_models_cache_to_disk(_shape_cache())
|
||||
|
||||
# Try to load under v0.50.293
|
||||
with_runtime_version("v0.50.293")
|
||||
loaded = config._load_models_cache_from_disk()
|
||||
assert loaded is None, (
|
||||
"Cache stamped with a different WebUI version must be rejected so the "
|
||||
"next call rebuilds with the current release's picker shape (#1633)"
|
||||
)
|
||||
|
||||
|
||||
def test_load_rejects_legacy_cache_without_version_stamp(isolated_cache, with_runtime_version):
|
||||
"""Pre-#1633 cache files have no _webui_version field at all. They must
|
||||
be treated as invalid on the very first load post-update so users get
|
||||
a fresh rebuild instead of stale picker contents."""
|
||||
from api import config
|
||||
|
||||
# Hand-write a pre-#1633 cache file (no version fields)
|
||||
legacy = _shape_cache()
|
||||
json.dump(legacy, open(isolated_cache, "w"))
|
||||
|
||||
with_runtime_version("v0.50.293")
|
||||
loaded = config._load_models_cache_from_disk()
|
||||
assert loaded is None, (
|
||||
"Legacy (pre-#1633) cache files must be rejected so the first call "
|
||||
"after updating to a release with #1633 rebuilds from live data"
|
||||
)
|
||||
|
||||
|
||||
def test_load_rejects_mismatched_schema_version(isolated_cache, with_runtime_version):
|
||||
"""Schema version mismatch invalidates the cache regardless of WebUI version.
|
||||
Forward-compat for future cache-shape changes."""
|
||||
from api import config
|
||||
|
||||
# Manually write a cache with a stale schema version but matching webui version
|
||||
stale = {
|
||||
"_schema_version": 0, # old
|
||||
"_webui_version": "v0.50.293",
|
||||
**_shape_cache(),
|
||||
}
|
||||
json.dump(stale, open(isolated_cache, "w"))
|
||||
|
||||
with_runtime_version("v0.50.293")
|
||||
loaded = config._load_models_cache_from_disk()
|
||||
assert loaded is None, (
|
||||
"Cache with a different schema version must be rejected even when "
|
||||
"WebUI version matches"
|
||||
)
|
||||
|
||||
|
||||
def test_load_skips_version_check_when_runtime_unknown(isolated_cache, monkeypatch):
|
||||
"""Early-init: if api.updates isn't loaded, _current_webui_version returns
|
||||
None. The version check should NOT run (because we have nothing to compare
|
||||
against), but other validity checks still apply.
|
||||
|
||||
This is the fail-safe path that prevents a boot-time wedge if the very
|
||||
first /api/models call fires before api.updates is imported.
|
||||
"""
|
||||
from api import config
|
||||
|
||||
# Write a cache that's correct except has no _webui_version
|
||||
cache = {
|
||||
"_schema_version": config._MODELS_CACHE_SCHEMA_VERSION,
|
||||
# no _webui_version
|
||||
**_shape_cache(),
|
||||
}
|
||||
json.dump(cache, open(isolated_cache, "w"))
|
||||
|
||||
monkeypatch.delitem(sys.modules, "api.updates", raising=False)
|
||||
loaded = config._load_models_cache_from_disk()
|
||||
# Loadable because runtime version was unknown — once api.updates loads,
|
||||
# the next call would re-validate.
|
||||
assert loaded is not None
|
||||
|
||||
|
||||
# ── Validity helpers ─────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def test_is_valid_models_cache_remains_shape_only():
|
||||
"""_is_valid_models_cache must NOT enforce version stamps — keep it loose
|
||||
so in-memory cache validations don't fail on missing _webui_version. The
|
||||
strict version check lives in _is_loadable_disk_cache only."""
|
||||
from api.config import _is_valid_models_cache
|
||||
cache = _shape_cache()
|
||||
# No _webui_version field
|
||||
assert _is_valid_models_cache(cache) is True
|
||||
|
||||
|
||||
def test_is_loadable_disk_cache_checks_versions(with_runtime_version):
|
||||
"""_is_loadable_disk_cache must check both schema + webui_version stamps."""
|
||||
from api import config
|
||||
with_runtime_version("v0.50.293")
|
||||
|
||||
# Missing _webui_version
|
||||
bad1 = {"_schema_version": config._MODELS_CACHE_SCHEMA_VERSION, **_shape_cache()}
|
||||
assert config._is_loadable_disk_cache(bad1) is False
|
||||
|
||||
# Wrong _webui_version
|
||||
bad2 = {
|
||||
"_schema_version": config._MODELS_CACHE_SCHEMA_VERSION,
|
||||
"_webui_version": "v0.50.281",
|
||||
**_shape_cache(),
|
||||
}
|
||||
assert config._is_loadable_disk_cache(bad2) is False
|
||||
|
||||
# Wrong _schema_version
|
||||
bad3 = {
|
||||
"_schema_version": 0,
|
||||
"_webui_version": "v0.50.293",
|
||||
**_shape_cache(),
|
||||
}
|
||||
assert config._is_loadable_disk_cache(bad3) is False
|
||||
|
||||
# Right
|
||||
good = {
|
||||
"_schema_version": config._MODELS_CACHE_SCHEMA_VERSION,
|
||||
"_webui_version": "v0.50.293",
|
||||
**_shape_cache(),
|
||||
}
|
||||
assert config._is_loadable_disk_cache(good) is True
|
||||
|
||||
|
||||
def test_is_loadable_disk_cache_rejects_non_dict():
|
||||
"""Non-dict input is invalid even when version checks are skipped."""
|
||||
from api.config import _is_loadable_disk_cache
|
||||
assert _is_loadable_disk_cache(None) is False
|
||||
assert _is_loadable_disk_cache([]) is False
|
||||
assert _is_loadable_disk_cache("string") is False
|
||||
assert _is_loadable_disk_cache(42) is False
|
||||
|
||||
|
||||
# ── Edge cases ───────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def test_load_handles_corrupt_json(isolated_cache, with_runtime_version):
|
||||
"""A corrupt cache file (truncated JSON, non-UTF8 bytes) must return None
|
||||
silently, not raise — the cache layer is best-effort."""
|
||||
from api import config
|
||||
|
||||
with open(isolated_cache, "wb") as f:
|
||||
f.write(b"{not valid json at all")
|
||||
|
||||
with_runtime_version("v0.50.293")
|
||||
loaded = config._load_models_cache_from_disk()
|
||||
assert loaded is None
|
||||
|
||||
|
||||
def test_load_handles_missing_file(isolated_cache, with_runtime_version):
|
||||
"""Cache file simply doesn't exist (cold boot) → None, not error."""
|
||||
from api import config
|
||||
# isolated_cache fixture creates the path but not the file
|
||||
assert not isolated_cache.exists()
|
||||
|
||||
with_runtime_version("v0.50.293")
|
||||
loaded = config._load_models_cache_from_disk()
|
||||
assert loaded is None
|
||||
|
||||
|
||||
def test_save_overwrite_atomic(isolated_cache, with_runtime_version):
|
||||
"""Saving twice with different versions overwrites cleanly via tmp+rename."""
|
||||
from api import config
|
||||
|
||||
with_runtime_version("v0.50.281")
|
||||
config._save_models_cache_to_disk(_shape_cache())
|
||||
assert json.load(open(isolated_cache))["_webui_version"] == "v0.50.281"
|
||||
|
||||
with_runtime_version("v0.50.293")
|
||||
config._save_models_cache_to_disk(_shape_cache())
|
||||
assert json.load(open(isolated_cache))["_webui_version"] == "v0.50.293"
|
||||
|
||||
|
||||
def test_save_skips_invalid_shape(isolated_cache, with_runtime_version):
|
||||
"""Pre-#1633 contract: invalid shape never lands on disk. Preserved."""
|
||||
from api import config
|
||||
with_runtime_version("v0.50.293")
|
||||
|
||||
# Missing required keys
|
||||
config._save_models_cache_to_disk({"active_provider": "anthropic"})
|
||||
assert not isolated_cache.exists()
|
||||
|
||||
|
||||
# ── End-to-end: simulate a Docker container update ───────────────────────
|
||||
|
||||
|
||||
def test_docker_update_scenario_invalidates_old_cache(isolated_cache, with_runtime_version):
|
||||
"""Reproduce Deor's exact scenario from the bug report:
|
||||
|
||||
1. Server v0.50.281 builds a cache and writes it to STATE_DIR.
|
||||
2. Container is updated to v0.50.292 (new image, same mounted state volume).
|
||||
3. New server boots and tries to load the cache file.
|
||||
4. Expected: load returns None, forcing a rebuild that picks up the
|
||||
picker fixes shipped between v0.50.281 and v0.50.292.
|
||||
"""
|
||||
from api import config
|
||||
|
||||
# Step 1: v0.50.281 writes cache
|
||||
with_runtime_version("v0.50.281")
|
||||
old_cache = {
|
||||
"active_provider": "nous",
|
||||
"default_model": "anthropic/claude-sonnet-4.6",
|
||||
"configured_model_badges": {"anthropic/claude-sonnet-4.6": "Anthropic"},
|
||||
# The pre-fix Nous group with only 4 models (the v0.50.281 bug)
|
||||
"groups": [{"name": "Nous Portal", "models": ["a", "b", "c", "d"]}],
|
||||
}
|
||||
config._save_models_cache_to_disk(old_cache)
|
||||
on_disk = json.load(open(isolated_cache))
|
||||
assert on_disk["_webui_version"] == "v0.50.281"
|
||||
assert len(on_disk["groups"][0]["models"]) == 4
|
||||
|
||||
# Step 2-3: Container updates to v0.50.292; new server tries to load
|
||||
with_runtime_version("v0.50.292")
|
||||
loaded = config._load_models_cache_from_disk()
|
||||
|
||||
# Step 4: cache rejected → caller will rebuild from live provider data
|
||||
assert loaded is None, (
|
||||
"After a WebUI version bump, the disk cache must be rejected so users "
|
||||
"see picker fixes immediately instead of waiting up to 24h for the TTL "
|
||||
"(#1633: Deor reported v0.50.292 looking identical to v0.50.281 because "
|
||||
"the v0.50.281 cache file was being reused unchanged)"
|
||||
)
|
||||
|
||||
|
||||
# ── invalidate_models_cache still cleans the disk file ───────────────────
|
||||
|
||||
|
||||
def test_invalidate_models_cache_still_deletes_disk_file(isolated_cache, with_runtime_version):
|
||||
"""Pre-existing contract preserved: invalidate_models_cache() drops the
|
||||
in-memory cache AND deletes the disk file. The version stamping must not
|
||||
interfere with this teardown path."""
|
||||
from api import config
|
||||
|
||||
with_runtime_version("v0.50.293")
|
||||
config._save_models_cache_to_disk(_shape_cache())
|
||||
assert isolated_cache.exists()
|
||||
|
||||
config.invalidate_models_cache()
|
||||
assert not isolated_cache.exists()
|
||||
@@ -33,8 +33,22 @@ def test_save_models_cache_to_disk_preserves_response_metadata(tmp_path, monkeyp
|
||||
|
||||
config._save_models_cache_to_disk(payload)
|
||||
|
||||
assert json.loads(cache_path.read_text(encoding="utf-8")) == payload
|
||||
assert config._load_models_cache_from_disk() == payload
|
||||
on_disk = json.loads(cache_path.read_text(encoding="utf-8"))
|
||||
# The four response-shape fields round-trip verbatim.
|
||||
for k, v in payload.items():
|
||||
assert on_disk[k] == v, f"Field {k!r} did not round-trip"
|
||||
# Plus the disk-only metadata stamps added by #1633 — present but not part
|
||||
# of the response payload.
|
||||
assert "_schema_version" in on_disk
|
||||
# _webui_version may be absent in early-init paths where api.updates isn't
|
||||
# yet imported; in normal test runs api.updates IS imported, so assert it.
|
||||
import sys
|
||||
if "api.updates" in sys.modules:
|
||||
assert on_disk.get("_webui_version") == sys.modules["api.updates"].WEBUI_VERSION
|
||||
|
||||
# Load returns ONLY the response-shape fields (stamps stripped).
|
||||
loaded = config._load_models_cache_from_disk()
|
||||
assert loaded == payload
|
||||
|
||||
|
||||
def test_load_models_cache_from_disk_rejects_legacy_groups_only_cache(tmp_path, monkeypatch):
|
||||
|
||||
@@ -466,11 +466,18 @@ def test_custom_endpoint_uses_model_config_api_key_for_model_discovery(monkeypat
|
||||
# -- Issue #230: custom provider with slash model name -----------------------
|
||||
|
||||
def test_custom_endpoint_slash_model_routes_to_custom_not_openrouter():
|
||||
"""Regression test for #230.
|
||||
"""Regression test for #230, updated for #1625.
|
||||
|
||||
When provider=custom (or any non-openrouter provider) and base_url is set,
|
||||
a model name containing a slash (e.g. google/gemma-4-26b-a4b) must NOT be
|
||||
rerouted to OpenRouter -- it should stay on the configured custom endpoint.
|
||||
|
||||
#1625 layered an additional rule on top: a base_url pointing at a loopback
|
||||
or private-IP host is treated as a local model server (LM Studio, Ollama,
|
||||
llama.cpp, vLLM, TabbyAPI), which register models under their full
|
||||
HuggingFace path. On such hosts the prefix is now PRESERVED. The original
|
||||
#433 strip behaviour still applies on public hosts (real OpenAI-compatible
|
||||
proxies like LiteLLM at https://litellm.example.com/v1).
|
||||
"""
|
||||
# --- custom provider with slash model name should NOT go to openrouter ---
|
||||
model, provider, base_url = _resolve_with_config(
|
||||
@@ -486,10 +493,22 @@ def test_custom_endpoint_slash_model_routes_to_custom_not_openrouter():
|
||||
assert base_url == 'http://127.0.0.1:1234/v1', (
|
||||
"Expected base_url 'http://127.0.0.1:1234/v1', got '{}'.".format(base_url)
|
||||
)
|
||||
# Fix #433: provider prefix is now stripped for custom endpoints so stale
|
||||
# prefixed model IDs from previous sessions do not break custom endpoint routing.
|
||||
assert model == 'gemma-4-26b-a4b', (
|
||||
"Model name prefix should be stripped for custom base_url endpoint, got '{}'.".format(model)
|
||||
# #1625 (supersedes the v0.50 #433 strip-on-custom rule for loopback hosts):
|
||||
# 127.0.0.1 base_url is almost certainly a local LM Studio / Ollama / etc.,
|
||||
# which keys models on the full HuggingFace path. Preserve the prefix.
|
||||
assert model == 'google/gemma-4-26b-a4b', (
|
||||
"Model name prefix must be PRESERVED on loopback base_url (#1625), got '{}'.".format(model)
|
||||
)
|
||||
|
||||
# --- public-host openai-compatible proxy STILL strips per #433 ----------
|
||||
model2, provider2, base_url2 = _resolve_with_config(
|
||||
'google/gemma-4-26b-a4b',
|
||||
provider='openai',
|
||||
base_url='https://litellm.example.com/v1',
|
||||
default='google/gemma-4-26b-a4b',
|
||||
)
|
||||
assert model2 == 'gemma-4-26b-a4b', (
|
||||
"Public-host OpenAI-compat proxy must still strip prefix per #433, got '{}'.".format(model2)
|
||||
)
|
||||
|
||||
# --- openrouter with slash model name MUST still route to openrouter -----
|
||||
|
||||
Reference in New Issue
Block a user