Reasoning models (Qwen3-thinking via LM Studio, DeepSeek-R1, Kimi-K2,
etc.) can burn their entire output budget on hidden reasoning tokens and
emit no visible content. The previous title-generation retry path
classified that as llm_length and doubled the budget — but the second
call produces the same shape, so the retry only doubled the GPU/credit
burn. Repeated across the two prompts in _title_prompts() this came to
~3000 reasoning tokens of GPU work per new chat. On local LM Studio
servers behind a custom: provider (where is_lmstudio=False means
reasoning_effort: none never reaches the model) it manifested as the GPU
never going idle after a prompt.
Fix:
- _extract_title_response: classify reasoning-bearing empty responses
as llm_empty_reasoning regardless of finish_reason. The presence of
reasoning_content is the diagnostic signal, not finish_reason.
- _title_retry_status: drop llm_empty_reasoning from the retry set.
Length-truncated responses WITHOUT reasoning still retry (those are
legitimately recoverable by a larger budget).
- Add _title_should_skip_remaining_attempts() and break out of the
prompt-iteration loop on empty-reasoning. A second prompt against
the same model would produce the same shape.
- Falls through to _fallback_title_from_exchange for a local-summary
title.
Tests updated to invert the previous reasoning-retry assertions:
- test_aux_short_circuits_on_empty_reasoning_without_retrying
- test_aux_still_retries_finish_length_without_reasoning
- test_agent_route_short_circuits_on_empty_reasoning_without_retrying
- test_agent_route_still_retries_finish_length_without_reasoning
Companion agent-side work (LM Studio classifier for custom: providers)
is tracked separately on the hermes-agent side; this WebUI fix is the
belt-and-braces guard so the loop stops regardless of agent classifier
state.
Reported by @darkopetrovic. Closes#2083.
Co-authored-by: darkopetrovic <darkopetrovic@users.noreply.github.com>
(cherry picked from commit efeae4a86e)
PR #2067 made TestVoiceModePreferenceGate.test_settings_pane_has_voice_mode_i18n_keys
adaptive via self.LOCALES but only defined LOCALES on the sibling class
TestComposerVoiceButtonI18n. AttributeError on CI.
Mirror the tuple to TestVoiceModePreferenceGate so the count assert resolves
to 10 with Italian present.
Co-authored-by: Samuel Gudi <samuel.gudi.official@gmail.com>
6 test files had hardcoded locale counts/lists that broke when
the Italian locale block was added:
- test_issue1488_composer_voice_buttons.py: added 'it' to LOCALES,
replaced assert count == 9 with len(self.LOCALES)
- test_issue1560_password_env_var_lock.py: added 'it' to LOCALES
- test_1560_password_env_var_no_op.py: added 'it' to EXPECTED_LOCALES
- test_login_locale_parity.py: bumped floor from 9 to 10, added 'it'
- test_stage268_opus_followups.py: bumped floor from 9 to 10
(cherry picked from commit f5e42cec9b)
Opus stage-339 review SHOULD-FIX items:
1. server.py: drop 'unsafe-eval' from CSP report-only policy.
Verified by grepping all production JS — zero matches for eval(),
new Function(), or string-form setTimeout/setInterval. Keeping it
was a gratuitous privilege.
2. server.py: add https://cdn.jsdelivr.net to script-src + style-src.
index.html loads Prism/xterm/katex from this CDN with SRI hashes —
without the allowance every page load fires known-good CSP violations
that drown out real signal once a collector is wired.
3. api/commands.py: sanitize plugin command error. Previously returned
f'Plugin command error: {exc}' which would leak paths/env from
FileNotFoundError('/etc/something/secret.key') etc. Now returns only
the exception type name; full traceback goes to server log.
Test asserts updated to match the new policy shape.
Co-authored-by: Opus advisor <opus-advisor@hermes.local>