feat: add manual provider usage refresh (Jordan-SkyLF)
Adds a 'Refresh usage' button on the Provider quota card in Settings → Providers,
with cache: 'no-store' fetch + browser cache-bust query string. Pure browser-side
cache-busting; the server-side /api/provider/quota endpoint has no cache layer
yet (refresh=1 query param is currently a no-op server-side; the win is bypassing
browser/proxy/SW caches).
fix: guard stale stream writebacks (LumenYoung)
Prevents stale WebUI stream workers from writing old results into a session
after that session has already moved on to another stream. Adds new helper
_stream_writeback_is_current() (a token equality check against the session's
active_stream_id) and short-circuits the two finalize/cancel paths when the
worker no longer owns the session writeback.
#2142 (legeantbleu) added the fr locale to static/i18n.js but didn't update:
1. tests/test_issue1488_composer_voice_buttons.py: two TestComposerVoiceButtonI18n + TestVoiceModePreferenceGate LOCALES tuples needed 'fr'
2. api/routes.py: _LOGIN_LOCALE needed an 'fr' block so the login page localizes for French users (issue #1442 parity contract)
3. tests/test_login_locale_parity.py: the test asserting 'fr' falls-back-to-'en' is inverted — fr now resolves to fr, with sibling assertions for fr-FR and fr-CA
Mirrors the stage-340 fix for the it locale (PR #2067 → maintainer adds tuple entries). 46/46 i18n tests pass after fix.
Reasoning models (Qwen3-thinking via LM Studio, DeepSeek-R1, Kimi-K2,
etc.) can burn their entire output budget on hidden reasoning tokens and
emit no visible content. The previous title-generation retry path
classified that as llm_length and doubled the budget — but the second
call produces the same shape, so the retry only doubled the GPU/credit
burn. Repeated across the two prompts in _title_prompts() this came to
~3000 reasoning tokens of GPU work per new chat. On local LM Studio
servers behind a custom: provider (where is_lmstudio=False means
reasoning_effort: none never reaches the model) it manifested as the GPU
never going idle after a prompt.
Fix:
- _extract_title_response: classify reasoning-bearing empty responses
as llm_empty_reasoning regardless of finish_reason. The presence of
reasoning_content is the diagnostic signal, not finish_reason.
- _title_retry_status: drop llm_empty_reasoning from the retry set.
Length-truncated responses WITHOUT reasoning still retry (those are
legitimately recoverable by a larger budget).
- Add _title_should_skip_remaining_attempts() and break out of the
prompt-iteration loop on empty-reasoning. A second prompt against
the same model would produce the same shape.
- Falls through to _fallback_title_from_exchange for a local-summary
title.
Tests updated to invert the previous reasoning-retry assertions:
- test_aux_short_circuits_on_empty_reasoning_without_retrying
- test_aux_still_retries_finish_length_without_reasoning
- test_agent_route_short_circuits_on_empty_reasoning_without_retrying
- test_agent_route_still_retries_finish_length_without_reasoning
Companion agent-side work (LM Studio classifier for custom: providers)
is tracked separately on the hermes-agent side; this WebUI fix is the
belt-and-braces guard so the loop stops regardless of agent classifier
state.
Reported by @darkopetrovic. Closes#2083.
Co-authored-by: darkopetrovic <darkopetrovic@users.noreply.github.com>
(cherry picked from commit efeae4a86e)