From 67e29fa9918f78fcdec52aa4da6fc23380acaed7 Mon Sep 17 00:00:00 2001 From: dobby-d-elf Date: Mon, 11 May 2026 13:13:26 -0600 Subject: [PATCH 01/12] feat: add opt-in streaming text fade --- STREAMING_FADE_HANDOFF.md | 289 +++++++++++++++++ api/config.py | 2 + static/boot.js | 2 + static/i18n.js | 18 ++ static/index.html | 7 + static/messages.js | 564 +++++++++++++++++++++++---------- static/panels.js | 14 +- static/style.css | 8 + tests/test_smooth_text_fade.py | 290 +++++++++++++++++ 9 files changed, 1032 insertions(+), 162 deletions(-) create mode 100644 STREAMING_FADE_HANDOFF.md create mode 100644 tests/test_smooth_text_fade.py diff --git a/STREAMING_FADE_HANDOFF.md b/STREAMING_FADE_HANDOFF.md new file mode 100644 index 00000000..ea3c8aeb --- /dev/null +++ b/STREAMING_FADE_HANDOFF.md @@ -0,0 +1,289 @@ +# Streaming Fade Text Effect Handoff + +## Summary + +This branch adds an opt-in **Fade text effect** preference for HermesWebUI streaming assistant responses. + +When enabled, newly streamed assistant words fade in instead of appearing via the default incremental markdown renderer. The goal is a ChatGPT/OpenWebUI-like animated streaming feel while still catching up to high-throughput model output. + +The feature is **off by default** for performance. + +## User-facing behavior + +- New setting: **Settings → Preferences → Fade text effect** +- Runtime global: `window._fadeTextEffect` +- Default: `false` +- When enabled: + - assistant stream uses a playout buffer rather than immediately rendering the full incoming chunk + - visible text advances at adaptive speed based on live incoming word velocity, backlog, and stream age + - new words are wrapped in spans and animated with opacity-only fade + - high-speed output uses rolling phrase-sized waves instead of giant block pops + - Hermes' bright live cursor is hidden during fade mode + +## Main files changed + +### `static/messages.js` + +Core streaming implementation inside `attachLiveStream(...)`. + +Added local fade state: + +- `_streamFadeVisibleText` +- `_streamFadeWordCarry` +- `_streamFadeWordBornAt` +- `_streamFadeArrivalWps` +- `_streamFadeLastRevealCount` +- `_streamFadeLatestAnimationEndAt` + +Key helpers: + +- `_resetStreamFadeState()` +- `_cancelPendingStreamRender()` +- `_shouldUseStreamFade()` +- `_streamFadeWordCountOf(text)` +- `_streamFadeNextText(targetText)` +- `_renderStreamingFadeMarkdown(displayText)` +- `_wrapStreamingFadeWords(root)` +- `_drainStreamFadeBeforeDone(onDone)` + +Important behavior: + +- Fade mode renders at ~60fps (`16ms`) while default streaming remains ~15fps (`66ms`). +- Default SMD streaming path remains intact when fade mode is off. +- On `done`, fade mode drains remaining buffered text and waits for the final stagger/fade window before the final `renderMessages()` replacement. +- Prefix resets now call `_resetStreamFadeState()` so stale birth timestamps do not leak across markdown/tool-call rewrites. + +### `static/style.css` + +Adds opacity-only streaming fade CSS: + +```css +.stream-fade-word.is-new { + animation: stream-fade-word-in var(--stream-fade-ms,140ms) ease-out both; +} +@keyframes stream-fade-word-in { from { opacity:0; } to { opacity:1; } } +``` + +Also hides the live cursor during fade mode: + +```css +[data-live-assistant="1"]:last-child .msg-body.stream-fade-active > :last-child::after, +[data-live-assistant="1"]:last-child .msg-body.stream-fade-active:not(:has(> *))::after { + display:none; + content:none; +} +``` + +### Settings plumbing + +- `api/config.py` + - adds `fade_text_effect` default and bool key +- `static/boot.js` + - initializes `window._fadeTextEffect` +- `static/index.html` + - adds Preferences checkbox +- `static/panels.js` + - loads, autosaves, and saves the setting +- `static/i18n.js` + - adds locale strings for all supported locales + +### Tests + +New file: + +- `tests/test_smooth_text_fade.py` + +Coverage includes: + +- setting persistence/config plumbing +- Preferences UI plumbing +- i18n key presence +- fade helper presence +- executable Node regressions that invoke `_streamFadeNextText(...)` +- speed-ramp behavior +- high-speed rolling-wave behavior +- done-drain behavior +- CSS expectations +- cursor hiding + +## Tunable constants + +Defined near the top of `attachLiveStream(...)` in `static/messages.js`: + +```js +const _STREAM_FADE_MS=140; +const _STREAM_FADE_WAVE_MS=320; +const _STREAM_FADE_MAX_STAGGER_MS=520; +``` + +Meaning: + +- `_STREAM_FADE_MS`: base fade duration for normal streaming +- `_STREAM_FADE_WAVE_MS`: longer duration for high-speed multi-word waves +- `_STREAM_FADE_MAX_STAGGER_MS`: max stagger spread across newly inserted words + +Adaptive playout speed currently uses: + +```js +const baseWps = 30 + Math.min(streamAgeSeconds * 4, 35); // 30 → 65 wps +const arrivalWps = _streamFadeArrivalWps ? Math.min(_streamFadeArrivalWps * 2.4 + 20, 320) : 0; +const backlogWps = backlogWords > 0 ? Math.min(30 + backlogWords * 8, 420) : 0; +const wordsPerSecond = Math.min(420, Math.max(baseWps, arrivalWps, backlogWps)); +``` + +Rolling burst floor: + +```js +const burstFloor = backlogWords >= 120 ? 24 + : backlogWords >= 60 ? 18 + : backlogWords >= 30 ? 12 + : wordsPerSecond >= 300 ? 8 + : wordsPerSecond >= 220 ? 6 + : 0; +``` + +High-speed waves then use: + +```js +const fadeMs = revealedThisFrame >= 8 ? _STREAM_FADE_WAVE_MS + : revealedThisFrame >= 4 ? 240 + : _STREAM_FADE_MS; + +const waveStepMs = revealedThisFrame >= 18 ? 18 + : revealedThisFrame >= 8 ? 22 + : revealedThisFrame >= 4 ? 16 + : 10; +``` + +## Design decisions and why + +### Why not use only OpenWebUI's renderer? + +A wholesale renderer transplant was avoided. Hermes keeps its existing streaming markdown path as default, and fade mode is a selective cosmetic layer. + +### Why a playout buffer? + +Hermes receives backend stream chunks that can arrive faster or more bursty than desired visually. Rendering each chunk immediately can pop large text blocks into the DOM. The playout buffer separates: + +- text received from backend (`assistantText`) +- text currently visible (`_streamFadeVisibleText`) + +### Why adaptive speed? + +A fixed reveal rate felt robotic and lagged behind faster models. Earlier attempts using session-wide average arrival rate failed when the model spent time reasoning before writing because the denominator inflated and the ramp never triggered. + +Current approach tracks **live target-word arrival velocity** using deltas: + +```js +const instantArrivalWps = (targetWords - _streamFadeLastTargetWords) * 1000 / arrivalElapsedMs; +_streamFadeArrivalWps = _streamFadeArrivalWps + ? (_streamFadeArrivalWps * 0.65 + instantArrivalWps * 0.35) + : instantArrivalWps; +``` + +Then playout deliberately exceeds arrival velocity so it catches up. + +### Why rolling waves? + +At very high throughput, revealing too many words in one frame felt chunky and made the fade almost disappear. The current implementation reduces one-frame burst size and stretches/staggers high-speed waves across several hundred milliseconds. + +This makes fast output feel more like animated text sweeping in rather than paragraph blocks appearing. + +## Performance notes + +Fade mode is more expensive than the default streaming path because it re-renders markdown and wraps visible text nodes during active streaming. + +Mitigations: + +- feature is opt-in and off by default +- default streaming-markdown path remains unchanged when disabled +- fade render cadence is capped at ~60fps +- skip wrapping inside `pre`, `code`, `script`, `style`, `textarea`, `svg`, and `math` +- animation is opacity-only, compositor-friendly + +Expected impact: + +- fine on modern desktop/Apple Silicon hardware +- higher CPU/battery use during long/high-speed responses +- users can disable it instantly from Preferences + +## Verification performed + +Commands run successfully: + +```bash +cd /Users/agent/HermesWebUI +PY=/Users/agent/.hermes/hermes-agent/venv/bin/python +$PY -m pytest tests/test_smooth_text_fade.py tests/test_1003_preferences_autosave.py tests/test_streaming_markdown.py tests/test_chinese_locale.py tests/test_japanese_locale.py tests/test_korean_locale.py tests/test_russian_locale.py tests/test_spanish_locale.py -q +node --check static/messages.js static/panels.js static/boot.js static/i18n.js +$PY -m py_compile api/config.py +git diff --check +``` + +Latest result before writing this handoff: + +```text +99 passed +``` + +Also performed: + +- dead/debug scan over diff for `TODO`, `FIXME`, `console.log`, `debugger`, stale `100ms`, stale `220ms`, stale `48` burst constants +- review cleanup: blocked late `token` / `reasoning` / `interim_assistant` mutations during fade done-drain, moved fade wave calculations out of the per-word hot path, and made manual Settings save refresh `window._fadeTextEffect` +- HermesWebUI restart via launchctl +- live asset verification via `curl http://127.0.0.1:8787/static/messages.js` +- real chat/SSE smoke test: temp session, prompt `Reply with exactly: OK`, received `OK`, got `done`, deleted temp session + +## Current service state when last verified + +- HermesWebUI runs on port `8787` +- Restarted during validation +- Health endpoint returned OK + +Useful checks: + +```bash +curl -fsS http://127.0.0.1:8787/health +curl -fsS http://127.0.0.1:8787/static/messages.js | grep -E "_STREAM_FADE_WAVE_MS=320|_STREAM_FADE_MAX_STAGGER_MS=520|burstFloor=backlogWords>=120\?24" +curl -fsS http://127.0.0.1:8787/static/style.css | grep -E "var\(--stream-fade-ms,140ms\)|stream-fade-word-in" +``` + +## Known caveats + +- LLM telemetry often reports **tokens/sec**, while the UI reveals visible words. These are not equivalent. +- The renderer cannot reveal text before complete visible text exists. +- If backend chunks arrive as very large bursts, the rolling-wave logic smooths them but may still require subjective tuning. +- The current visual is close, but final merge review should include manual browser testing with: + - normal-speed model + - high-throughput model (~100+ tok/s) + - long markdown responses + - code blocks + - lists/tables + - tool-call-heavy responses + +## Suggested next review steps + +1. Manually test in browser after hard refresh (`Cmd+Shift+R`). +2. Try a high-throughput long essay and tune only these constants if needed: + - `_STREAM_FADE_WAVE_MS` + - `_STREAM_FADE_MAX_STAGGER_MS` + - burst floor thresholds + - `waveStepMs` +3. Check the diff for whether the `done` handler reindent is acceptable for the PR. It is intentional because the original done body is now wrapped in `_finishDone` so fade mode can drain before final DOM replacement. +4. If submitting PR, mention the feature is opt-in/off-by-default and the default streaming markdown path remains unchanged. + +## Files to include in PR + +Expected modified/new files: + +```text +api/config.py +static/boot.js +static/i18n.js +static/index.html +static/messages.js +static/panels.js +static/style.css +tests/test_smooth_text_fade.py +STREAMING_FADE_HANDOFF.md +``` diff --git a/api/config.py b/api/config.py index 0c241ce5..bcd8b34f 100644 --- a/api/config.py +++ b/api/config.py @@ -3880,6 +3880,7 @@ _SETTINGS_DEFAULTS = { "send_key": "enter", # 'enter' or 'ctrl+enter' "show_token_usage": False, # show input/output token badge below assistant messages "show_tps": False, # show tokens-per-second chip in assistant message headers + "fade_text_effect": False, # animate newly streamed words with a lightweight fade-in effect "show_cli_sessions": False, # merge CLI sessions from state.db into the sidebar "sync_to_insights": False, # mirror WebUI token usage to state.db for /insights "check_for_updates": True, # check if webui/agent repos are behind upstream @@ -4008,6 +4009,7 @@ _SETTINGS_BOOL_KEYS = { "onboarding_completed", "show_token_usage", "show_tps", + "fade_text_effect", "show_cli_sessions", "sync_to_insights", "check_for_updates", diff --git a/static/boot.js b/static/boot.js index e08ad6e9..5bf48317 100644 --- a/static/boot.js +++ b/static/boot.js @@ -1376,6 +1376,7 @@ function applyBotName(){ window._sendKey=s.send_key||'enter'; window._showTokenUsage=!!s.show_token_usage; window._showTps=!!s.show_tps; + window._fadeTextEffect=!!s.fade_text_effect; window._showCliSessions=!!s.show_cli_sessions; window._soundEnabled=!!s.sound_enabled; window._notificationsEnabled=!!s.notifications_enabled; @@ -1412,6 +1413,7 @@ function applyBotName(){ window._sendKey='enter'; window._showTokenUsage=false; window._showTps=false; + window._fadeTextEffect=false; window._showCliSessions=false; window._soundEnabled=false; window._notificationsEnabled=false; diff --git a/static/i18n.js b/static/i18n.js index 20820631..e27c8fac 100644 --- a/static/i18n.js +++ b/static/i18n.js @@ -230,6 +230,8 @@ const LOCALES = { busy_interrupt_confirm: 'Interrupted — sending new message', settings_label_busy_input_mode: 'Busy input mode', settings_desc_busy_input_mode: 'Controls what happens when you send a message while the agent is running. Queue waits; Interrupt cancels and starts fresh; Steer injects a correction mid-turn without interrupting (falls back to queue when agent or stream unavailable).', + settings_label_fade_text_effect: 'Fade text effect', + settings_desc_fade_text_effect: 'Fade newly streamed words in while the assistant is responding. Similar to OpenWebUI; off by default for maximum performance.', settings_busy_input_mode_queue: 'Queue follow-up', settings_busy_input_mode_interrupt: 'Interrupt current turn', settings_busy_input_mode_steer: 'Steer (mid-turn correction)', @@ -1320,6 +1322,8 @@ const LOCALES = { busy_interrupt_confirm: '中断 — 新しいメッセージを送信中', settings_label_busy_input_mode: 'ビジー時の入力モード', settings_desc_busy_input_mode: 'エージェント実行中にメッセージを送信した時の動作を制御します。Queue は待機、Interrupt はキャンセルして再開、Steer は中断せずにターン中に修正を注入します (エージェントやストリームが利用不可ならキューにフォールバック)。', + settings_label_fade_text_effect: 'テキストのフェード効果', + settings_desc_fade_text_effect: 'アシスタントの応答中に新しくストリーミングされた単語をフェードインします。OpenWebUI に似た表示です。最大パフォーマンスのため既定ではオフです。', settings_busy_input_mode_queue: 'フォローアップをキュー', settings_busy_input_mode_interrupt: '現在のターンを中断', settings_busy_input_mode_steer: 'ステア (ターン中の修正)', @@ -2366,6 +2370,8 @@ const LOCALES = { busy_interrupt_confirm: 'Прервано — отправка нового сообщения', settings_label_busy_input_mode: 'Режим ввода при занятости', settings_desc_busy_input_mode: 'Определяет поведение при отправке сообщения во время работы агента. Очередь ждёт; Прерывание отменяет и начинает заново; Steer внедряет коррекцию без прерывания.', + settings_label_fade_text_effect: 'Эффект плавного появления текста', + settings_desc_fade_text_effect: 'Плавно показывает новые слова во время ответа ассистента. Похоже на OpenWebUI; по умолчанию выключено для максимальной производительности.', settings_busy_input_mode_queue: 'Поставить в очередь', settings_busy_input_mode_interrupt: 'Прервать текущий оборот', settings_busy_input_mode_steer: 'Steer (прерывание + отправка)', @@ -3427,6 +3433,8 @@ const LOCALES = { busy_interrupt_confirm: 'Interrumpido \u2014 enviando nuevo mensaje', settings_label_busy_input_mode: 'Modo de entrada ocupada', settings_desc_busy_input_mode: 'Controla qué sucede al enviar mensajes mientras el agente está activo. Cola espera; Interrumpir cancela y empieza de nuevo; Steer inyecta una corrección sin interrumpir (usa cola si el agente no está disponible).', + settings_label_fade_text_effect: 'Efecto de desvanecimiento de texto', + settings_desc_fade_text_effect: 'Hace aparecer gradualmente las palabras nuevas mientras el asistente responde. Similar a OpenWebUI; desactivado por defecto para máximo rendimiento.', settings_busy_input_mode_queue: 'Poner en cola', settings_busy_input_mode_interrupt: 'Interrumpir turno actual', settings_busy_input_mode_steer: 'Steer (corrección a mitad de turno)', @@ -4427,6 +4435,8 @@ const LOCALES = { busy_interrupt_confirm: 'Unterbrochen \u2014 neue Nachricht wird gesendet', settings_label_busy_input_mode: 'Eingabemodus bei Besch\u00e4ftigung', settings_desc_busy_input_mode: 'Steuert, was passiert, wenn Sie w\u00e4hrend der Agentenaktivit\u00e4t eine Nachricht senden. Warteschlange wartet; Unterbrechen bricht ab und startet neu; Steer f\u00fcgt eine Korrektur ein ohne zu unterbrechen.', + settings_label_fade_text_effect: 'Text-Fade-Effekt', + settings_desc_fade_text_effect: 'Blendet neu gestreamte Wörter während der Antwort des Assistenten sanft ein. Ähnlich wie OpenWebUI; für maximale Leistung standardmäßig deaktiviert.', settings_busy_input_mode_queue: 'In Warteschlange einreihen', settings_busy_input_mode_interrupt: 'Aktuellen Durchgang unterbrechen', settings_busy_input_mode_steer: 'Steer (Korrektur ohne Unterbrechung)', @@ -5474,6 +5484,8 @@ const LOCALES = { busy_interrupt_confirm: '已中断 — 正在发送新消息', settings_label_busy_input_mode: '忙碌输入模式', settings_desc_busy_input_mode: '控制在代理运行时发送消息的行为。队列等待;中断取消并重新开始;Steer中途注入纠正,不中断。', + settings_label_fade_text_effect: '文本淡入效果', + settings_desc_fade_text_effect: '在助手回复时让新流式输出的词语淡入显示。类似 OpenWebUI;为获得最佳性能默认关闭。', settings_busy_input_mode_queue: '加入队列', settings_busy_input_mode_interrupt: '中断当前回合', settings_busy_input_mode_steer: 'Steer(中断 + 发送)', @@ -7009,6 +7021,8 @@ const LOCALES = { busy_interrupt_confirm: '\u5df2\u4e2d\u65ad \u2014 \u6b63\u5728\u767c\u9001\u65b0\u8a0a\u606f', settings_label_busy_input_mode: '\u5fd9\u788c\u8f38\u5165\u6a21\u5f0f', settings_desc_busy_input_mode: '\u63a7\u5236\u5728\u4ee3\u7406\u904b\u884c\u6642\u767c\u9001\u8a0a\u606f\u7684\u884c\u70ba\u3002\u4f47\u5217\u7b49\u5f85\uff1b\u4e2d\u65b7\u53d6\u6d88\u4e26\u91cd\u65b0\u958b\u59cb\uff1bSteer\u4e2d\u9014\u6ce8\u5165\u7d3a\u6b63\uff0c\u4e0d\u4e2d\u65b7\u3002', + settings_label_fade_text_effect: '文字淡入效果', + settings_desc_fade_text_effect: '在助理回覆時讓新串流輸出的詞語淡入顯示。類似 OpenWebUI;為獲得最佳效能預設關閉。', settings_busy_input_mode_queue: '\u52a0\u5165\u4f47\u5217', settings_busy_input_mode_interrupt: '\u4e2d\u65ad\u7576\u524d\u56de\u5408', settings_busy_input_mode_steer: 'Steer\uff08\u4e2d\u9014\u7d3a\u6b63\uff09', @@ -7555,6 +7569,8 @@ const LOCALES = { busy_interrupt_confirm: 'Interrompido — enviando nova mensagem', settings_label_busy_input_mode: 'Modo de input ocupado', settings_desc_busy_input_mode: 'Controla o que acontece ao enviar mensagem com agente rodando. Fila espera; Interromper cancela; Steer injeta correção.', + settings_label_fade_text_effect: 'Efeito de fade no texto', + settings_desc_fade_text_effect: 'Faz novas palavras aparecerem gradualmente enquanto o assistente responde. Similar ao OpenWebUI; desativado por padrão para melhor desempenho.', settings_busy_input_mode_queue: 'Enfileirar follow-up', settings_busy_input_mode_interrupt: 'Interromper turno atual', settings_busy_input_mode_steer: 'Steer (correção no meio do turno)', @@ -8521,6 +8537,8 @@ const LOCALES = { busy_interrupt_confirm: 'Interrupted — sending new message', settings_label_busy_input_mode: '작업 중 입력 방식', settings_desc_busy_input_mode: '에이전트가 실행 중일 때 메시지를 보내면 어떻게 처리할지 제어합니다. 대기는 다음 차례까지 기다리고, 중단은 현재 작업을 취소하고 새로 시작하며, 조정은 현재 작업을 중단하지 않고 중간 수정 사항을 전달합니다(에이전트 또는 스트림을 사용할 수 없으면 대기로 전환).', + settings_label_fade_text_effect: '텍스트 페이드 효과', + settings_desc_fade_text_effect: '어시스턴트가 응답하는 동안 새로 스트리밍되는 단어를 부드럽게 표시합니다. OpenWebUI와 비슷하며, 최대 성능을 위해 기본값은 꺼짐입니다.', settings_busy_input_mode_queue: '후속 메시지 대기', settings_busy_input_mode_interrupt: '현재 작업 중단', settings_busy_input_mode_steer: '조정(중간 수정)', diff --git a/static/index.html b/static/index.html index 2fd68911..40e0d2bb 100644 --- a/static/index.html +++ b/static/index.html @@ -981,6 +981,13 @@
Displays tokens per second in assistant message headers while streaming and after a response completes. Off by default.
+
+ +
Fade newly streamed words in while the assistant is responding. Similar to OpenWebUI; off by default for maximum performance.
+