From fb822239eaa1a84fbb7da0d6715436e07bfc13c6 Mon Sep 17 00:00:00 2001 From: Sanjay Santhanam <51058514+Sanjays2402@users.noreply.github.com> Date: Fri, 8 May 2026 21:14:22 -0700 Subject: [PATCH] fix(#1937): close endless-scroll prefetch vs Start-jump race with generation-token + mutex MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The originally-proposed fix (gate _ensureAllMessagesLoaded on the existing _loadingOlder flag) does not actually close the race. By the time the prefetch reaches its post-await body, it has already cleared the entry- gate that reads _loadingOlder, so a same-flag check inside the resolved callback would be a no-op for an in-flight request. The actual fix is two-pronged: 1. New module-scoped _messagesGeneration counter, bumped every time S.messages is wholesale-replaced. _loadOlderMessages snapshots it BEFORE its await and re-checks after — if it changed, the prepend is aborted. This is the canonical async-invalidation pattern. 2. _ensureAllMessagesLoaded now claims the _loadingOlder mutex around its body so a new prefetch cannot start mid-replace and concurrent ensure-all calls (rapid double-click on Start) serialize cleanly. It bumps the generation token before mutating S.messages, yields until any in-flight prefetch finishes, and resets _oldestIdx so a subsequent prefetch cannot request stale older messages. Also adds the same-session / _loadingSessionId guards that the original ensure-all body was missing post-await — if the user switched sessions mid-flight, the old code would happily overwrite the new session's messages with the previous session's full history. 12 new regression tests in tests/test_issue1937_endless_scroll_jumpstart_race.py lock in: generation token declaration, bump-helper presence, snapshot- before-await ordering, post-await-abort behaviour, mutex acquisition and finally-release, yield-then-claim ordering when a prefetch is in flight, generation bump during the wait phase, _oldestIdx reset, and the new session-switch guard. Closes #1937. --- CHANGELOG.md | 8 +- static/sessions.js | 82 ++++++- ...issue1937_endless_scroll_jumpstart_race.py | 212 ++++++++++++++++++ 3 files changed, 292 insertions(+), 10 deletions(-) create mode 100644 tests/test_issue1937_endless_scroll_jumpstart_race.py diff --git a/CHANGELOG.md b/CHANGELOG.md index eeaa0e04..4af4eec2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,11 @@ # Hermes Web UI -- Changelog +## [Unreleased] + +### Fixed + +- **#1937 — Race: endless-scroll prefetch vs Start-jump's `_ensureAllMessagesLoaded` could duplicate messages.** With both `session_jump_buttons` AND `session_endless_scroll` enabled, an in-flight `_loadOlderMessages` prefetch racing with `jumpToSessionStart` → `_ensureAllMessagesLoaded` could prepend a duplicate page if the prefetch resolved last. The naive fix suggested in the report (gate ensure-all on the existing `_loadingOlder` flag) does not actually close the race — by the time the prefetch reaches its post-await body, it has already cleared the entry-gate that reads `_loadingOlder`, so a same-flag check inside the resolved callback is a no-op. The actual fix is a generation-token + mutex pair: (1) `_loadOlderMessages` snapshots a new module-scoped `_messagesGeneration` counter BEFORE its `await api(...)` and re-checks it after, aborting the prepend if any wholesale-replace bumped the token mid-flight; (2) `_ensureAllMessagesLoaded` claims the `_loadingOlder` mutex around its body (so a NEW prefetch cannot start mid-replace, and concurrent ensure-all calls from rapid double-clicks on Start serialize cleanly), bumps the generation token before mutating `S.messages`, yields until any in-flight prefetch's `finally`-block releases the mutex, and resets `_oldestIdx` so a subsequent prefetch cannot send a stale `msg_before` index. Also adds the same-session and `_loadingSessionId` guards that the original ensure-all body was missing post-await. (`static/sessions.js`, `tests/test_issue1937_endless_scroll_jumpstart_race.py` — 12 new regression tests) + ## [v0.51.30] — 2026-05-08 — 3-PR contributor batch (Release G: offline recovery + PWA hardening + opt-in session jump buttons + opt-in endless-scroll) ### Added (3 PRs, all from @ai-ag2026) @@ -31,7 +37,7 @@ ### Follow-up items filed (non-blocking) -- **Race between endless-scroll prefetch and Start-jump's `_ensureAllMessagesLoaded`** — with both opt-ins ON, an in-flight prefetch (started by 1.5x-viewport trigger) racing with `jumpToSessionStart` → `_ensureAllMessagesLoaded` could produce duplicate messages if the prefetch resolves last. Narrow window, but the fix is to gate `_ensureAllMessagesLoaded` on the existing `_loadingOlder` flag. +- **Race between endless-scroll prefetch and Start-jump's `_ensureAllMessagesLoaded`** — with both opt-ins ON, an in-flight prefetch (started by 1.5x-viewport trigger) racing with `jumpToSessionStart` → `_ensureAllMessagesLoaded` could produce duplicate messages if the prefetch resolves last. Narrow window, but the fix is to gate `_ensureAllMessagesLoaded` on the existing `_loadingOlder` flag. **Resolved in Unreleased — see #1937 entry above; final fix uses generation-token + mutex rather than the originally-suggested flag gate, which would not have closed the race.** - **#1928 locale parity** — `session_jump_*` and `settings_*_session_jump_buttons` keys are English literals in ja/ru/es/de/zh/zh-Hant/pt/ko. Default-OFF + English fallback works, but breaks the locale-parity standard set by #1929 and #1891 in the same release. diff --git a/static/sessions.js b/static/sessions.js index 8a88217a..62488053 100644 --- a/static/sessions.js +++ b/static/sessions.js @@ -998,6 +998,19 @@ let _loadingOlder = false; // oldest message currently loaded in S.messages. Starts at 0 when all // messages are loaded, or > 0 when truncated by msg_limit. let _oldestIdx = 0; +// Generation token bumped every time S.messages is wholesale-replaced +// (rather than incrementally extended). _loadOlderMessages snapshots it +// before its `await` and re-checks after, so a late-resolving prefetch +// does not prepend onto a transcript that was rebuilt under it +// (e.g. by _ensureAllMessagesLoaded after a Start-jump). See #1937. +let _messagesGeneration = 0; +function _bumpMessagesGeneration() { + // Wrap to keep the counter bounded; the only operation that matters is + // strict inequality between the snapshot and the post-await read, so any + // monotonic bump is sufficient. + _messagesGeneration = (_messagesGeneration + 1) | 0; + return _messagesGeneration; +} async function _loadOlderMessages() { if (_loadingOlder || !_messagesTruncated) return; @@ -1005,6 +1018,11 @@ async function _loadOlderMessages() { if (!sid || !S.messages.length) return; if (_oldestIdx <= 0) { _messagesTruncated = false; return; } _loadingOlder = true; + // Snapshot the generation BEFORE we await. If S.messages is wholesale + // replaced while the request is in flight, the post-await check below + // bails out so we never prepend stale older messages onto a freshly + // rebuilt transcript (#1937). + const startGeneration = _messagesGeneration; try { const data = await api(`/api/session?session_id=${encodeURIComponent(sid)}&messages=1&resolve_model=0&msg_before=${_oldestIdx}&msg_limit=${_INITIAL_MSG_LIMIT}`); // Guard: api() may have redirected (401) and returned undefined. @@ -1017,6 +1035,13 @@ async function _loadOlderMessages() { if (!data || !data.session) return; if (!S.session || S.session.session_id !== sid) return; if (_loadingSessionId !== null && _loadingSessionId !== sid) return; + // Generation guard: another code path (typically jumpToSessionStart → + // _ensureAllMessagesLoaded) may have replaced S.messages while we were + // awaiting. Prepending older messages onto that replacement would + // duplicate the head of the transcript. Detect via the generation + // counter and abort cleanly. _oldestIdx and _messagesTruncated were + // already reset by the wholesale-replace path, so no rollback needed. + if (_messagesGeneration !== startGeneration) return; const olderMsgs = (data.session.messages || []).filter(m => m && m.role); if (!olderMsgs.length) { _messagesTruncated = false; return; } // Prepend older messages @@ -1063,17 +1088,56 @@ async function _loadOlderMessages() { // Ensure the full message history is loaded (for undo, export, etc). // If the session was loaded with msg_limit, this fetches all messages. +// +// Race-safety (#1937): with the endless-scroll opt-in, _loadOlderMessages +// may be in flight when this runs (e.g. user scrolled near the top, then +// hit the Start jump pill). Two coordinated guards prevent the prefetch +// from prepending duplicate messages onto our wholesale replacement: +// 1. Hold the _loadingOlder mutex around the body so a NEW prefetch +// cannot start mid-replace (entry-gate check at line ~1003 returns +// early). The mutex is also self-protecting against concurrent +// ensure-all calls from rapid double-clicks on Start. +// 2. Bump _messagesGeneration before mutating S.messages so any +// in-flight prefetch's post-await generation check bails out. async function _ensureAllMessagesLoaded() { if (!_messagesTruncated || !S.session) return; - const sid = S.session.session_id; - const data = await api(`/api/session?session_id=${encodeURIComponent(sid)}&messages=1&resolve_model=0`); - // Guard: api() may have redirected (401) and returned undefined. - if (!data || !data.session) return; - const msgs = (data.session.messages || []).filter(m => m && m.role); - S.messages = msgs; - _messagesTruncated = false; - if(S.session && S.session.session_id === sid){ - S.session.message_count = Number(data.session.message_count || msgs.length); + if (_loadingOlder) { + // A prefetch is mid-flight (between the `_loadingOlder = true` line + // and its post-await guards). Bumping the generation token now + // poisons that prefetch's continuation, but we still need to claim + // the mutex AFTER it releases. Yield until the prefetch finishes + // (its finally-block clears _loadingOlder) before fetching the full + // history ourselves. The generation bump below ensures any other + // future race against this same continuation also fails closed. + _bumpMessagesGeneration(); + while (_loadingOlder) { + await new Promise(resolve => setTimeout(resolve, 16)); + } + if (!_messagesTruncated || !S.session) return; + } + _loadingOlder = true; + try { + const sid = S.session.session_id; + const data = await api(`/api/session?session_id=${encodeURIComponent(sid)}&messages=1&resolve_model=0`); + // Guard: api() may have redirected (401) and returned undefined. + if (!data || !data.session) return; + // Session may have been switched while we awaited. Bail rather than + // overwrite the new session's messages. + if (!S.session || S.session.session_id !== sid) return; + if (_loadingSessionId !== null && _loadingSessionId !== sid) return; + const msgs = (data.session.messages || []).filter(m => m && m.role); + // Bump the generation BEFORE the wholesale replace so any racing + // prefetch (whose snapshot was taken before this call's mutex + // acquisition) sees the new value and aborts. + _bumpMessagesGeneration(); + S.messages = msgs; + _messagesTruncated = false; + _oldestIdx = 0; + if (S.session && S.session.session_id === sid) { + S.session.message_count = Number(data.session.message_count || msgs.length); + } + } finally { + _loadingOlder = false; } } diff --git a/tests/test_issue1937_endless_scroll_jumpstart_race.py b/tests/test_issue1937_endless_scroll_jumpstart_race.py new file mode 100644 index 00000000..7f3db384 --- /dev/null +++ b/tests/test_issue1937_endless_scroll_jumpstart_race.py @@ -0,0 +1,212 @@ +"""Regression test for issue #1937 — endless-scroll prefetch vs Start-jump race. + +When both ``session_jump_buttons`` and ``session_endless_scroll`` opt-ins +are enabled, ``_loadOlderMessages`` (the endless-scroll prefetch) can be in +flight when the user clicks the Start jump pill, which calls +``_ensureAllMessagesLoaded``. If the prefetch resolves AFTER the +ensure-all wholesale-replaces ``S.messages``, it would prepend a duplicate +page. + +The fix uses two coordinated guards: + +1. A ``_messagesGeneration`` token that gets bumped any time + ``S.messages`` is wholesale-replaced. ``_loadOlderMessages`` snapshots + the token before its ``await`` and re-checks afterwards; if it changed, + the prepend is aborted. + +2. ``_ensureAllMessagesLoaded`` claims the existing ``_loadingOlder`` + mutex around its body so no NEW prefetch can start mid-replace, and so + concurrent ensure-all invocations (e.g. rapid double-click on Start) + serialize cleanly. It also yields until any in-flight prefetch's + ``finally`` clears the flag before claiming the mutex itself. + +The old fix shape suggested in the issue (spin-wait on ``_loadingOlder`` +before running ensure-all) does not actually solve the race the report +describes: by the time the prefetch passes its entry-gate check, it is +already past the only point where ``_loadingOlder`` is read, so a same- +flag check inside its post-await body would be a no-op. The generation +token is the canonical pattern for invalidating async continuations and +is what this regression suite locks in. +""" + +from pathlib import Path + +REPO = Path(__file__).resolve().parents[1] +SESSIONS_JS = (REPO / "static" / "sessions.js").read_text(encoding="utf-8") + + +def _function_body(src: str, name: str) -> str: + """Slice the body of ``async function `` (or ``function ``).""" + needle_async = f"async function {name}" + needle_sync = f"function {name}" + if needle_async in src: + start = src.index(needle_async) + else: + start = src.index(needle_sync) + brace = src.index("{", start) + depth = 0 + for i in range(brace, len(src)): + if src[i] == "{": + depth += 1 + elif src[i] == "}": + depth -= 1 + if depth == 0: + return src[start : i + 1] + raise AssertionError(f"function {name!r} body not found") + + +# --------------------------------------------------------------------------- +# Generation token: declared at module scope, bumped via the helper. +# --------------------------------------------------------------------------- + +def test_generation_token_declared_at_module_scope(): + """``_messagesGeneration`` exists as a module-scoped mutable counter.""" + assert "let _messagesGeneration = 0;" in SESSIONS_JS, ( + "static/sessions.js must declare `let _messagesGeneration = 0;` so " + "_loadOlderMessages can snapshot/re-check it across its `await`. " + "See #1937." + ) + + +def test_generation_bump_helper_exists(): + """A single helper bumps the generation; both consumers route through it.""" + assert "function _bumpMessagesGeneration()" in SESSIONS_JS, ( + "static/sessions.js must define `_bumpMessagesGeneration()` so " + "wholesale-replace sites have a single, named pivot to call. See #1937." + ) + body = _function_body(SESSIONS_JS, "_bumpMessagesGeneration") + assert "_messagesGeneration" in body, ( + "_bumpMessagesGeneration must mutate _messagesGeneration" + ) + + +# --------------------------------------------------------------------------- +# _loadOlderMessages: snapshot before await, re-check after. +# --------------------------------------------------------------------------- + +def test_load_older_snapshots_generation_before_await(): + """Snapshot must be captured BEFORE the `await api(...)` call.""" + body = _function_body(SESSIONS_JS, "_loadOlderMessages") + snapshot_idx = body.index("const startGeneration = _messagesGeneration;") + await_idx = body.index("await api(") + assert snapshot_idx < await_idx, ( + "_loadOlderMessages must snapshot _messagesGeneration before its " + "`await`. Capturing it after the await defeats the race guard. " + "See #1937." + ) + + +def test_load_older_aborts_when_generation_changed(): + """Post-await guard must compare against the snapshot and abort.""" + body = _function_body(SESSIONS_JS, "_loadOlderMessages") + assert "if (_messagesGeneration !== startGeneration) return;" in body, ( + "_loadOlderMessages must bail out (without prepending) when the " + "generation token changed during its await — that is the signal " + "that S.messages was wholesale-replaced under it. See #1937." + ) + + +def test_load_older_generation_check_runs_before_prepend(): + """Generation check must come BEFORE the `S.messages = [...older, ...]` mutation.""" + body = _function_body(SESSIONS_JS, "_loadOlderMessages") + guard_idx = body.index("if (_messagesGeneration !== startGeneration) return;") + prepend_idx = body.index("S.messages = [...olderMsgs, ...S.messages];") + assert guard_idx < prepend_idx, ( + "Generation guard must short-circuit BEFORE the prepend. " + "Otherwise duplicate messages can still slip through. See #1937." + ) + + +# --------------------------------------------------------------------------- +# _ensureAllMessagesLoaded: claims the mutex, bumps the generation, yields. +# --------------------------------------------------------------------------- + +def test_ensure_all_bumps_generation_before_replace(): + """Bump must happen BEFORE `S.messages = msgs` so racing prefetch sees it.""" + body = _function_body(SESSIONS_JS, "_ensureAllMessagesLoaded") + bump_idx = body.rindex("_bumpMessagesGeneration()") + replace_idx = body.index("S.messages = msgs;") + assert bump_idx < replace_idx, ( + "_ensureAllMessagesLoaded must bump the generation token BEFORE the " + "wholesale replace, otherwise an in-flight prefetch's post-await " + "check could read the old value and prepend duplicates. See #1937." + ) + + +def test_ensure_all_claims_loading_older_mutex(): + """The body must hold `_loadingOlder = true` so no NEW prefetch starts mid-replace.""" + body = _function_body(SESSIONS_JS, "_ensureAllMessagesLoaded") + assert "_loadingOlder = true;" in body, ( + "_ensureAllMessagesLoaded must claim the _loadingOlder mutex so " + "the entry-gate in _loadOlderMessages short-circuits new prefetches " + "while ensure-all is mid-replace. See #1937." + ) + assert "_loadingOlder = false;" in body, ( + "_ensureAllMessagesLoaded must release the _loadingOlder mutex in " + "its finally-block. Otherwise endless-scroll silently breaks after " + "every Start-jump." + ) + + +def test_ensure_all_releases_mutex_in_finally(): + """Mutex release must live inside a `finally` so errors don't leak the lock.""" + body = _function_body(SESSIONS_JS, "_ensureAllMessagesLoaded") + finally_idx = body.index("} finally {") + release_idx = body.index("_loadingOlder = false;", finally_idx) + assert release_idx > finally_idx, ( + "_loadingOlder release must be inside the finally-block to survive " + "thrown errors during the wholesale replace. See #1937." + ) + + +def test_ensure_all_yields_when_prefetch_in_flight(): + """When a prefetch holds the mutex, ensure-all must wait, not wholesale-replace alongside it.""" + body = _function_body(SESSIONS_JS, "_ensureAllMessagesLoaded") + # Look for the yield-loop on _loadingOlder before the mutex claim. + yield_idx = body.index("while (_loadingOlder)") + claim_idx = body.index("_loadingOlder = true;") + assert yield_idx < claim_idx, ( + "_ensureAllMessagesLoaded must yield (poll _loadingOlder) BEFORE " + "claiming the mutex itself, so an in-flight prefetch's finally-" + "block fires and the generation guard inside that prefetch resolves " + "the race cleanly. See #1937." + ) + + +def test_ensure_all_bumps_generation_during_wait_phase(): + """Bumping during the wait poisons any in-flight prefetch immediately, even before ensure-all gets the mutex.""" + body = _function_body(SESSIONS_JS, "_ensureAllMessagesLoaded") + # Find the _loadingOlder branch that runs when a prefetch is in flight, + # and verify it bumps the generation before the wait loop. + branch_idx = body.index("if (_loadingOlder) {") + wait_idx = body.index("while (_loadingOlder)", branch_idx) + bump_in_branch = body.index("_bumpMessagesGeneration()", branch_idx) + assert branch_idx < bump_in_branch < wait_idx, ( + "When a prefetch is in flight at entry, _ensureAllMessagesLoaded " + "must bump the generation BEFORE the wait loop so the in-flight " + "prefetch's post-await check fires the moment its api() resolves, " + "not just for future calls. See #1937." + ) + + +def test_ensure_all_resets_oldest_idx(): + """After wholesale-replacing with the full history, _oldestIdx must reset to 0.""" + body = _function_body(SESSIONS_JS, "_ensureAllMessagesLoaded") + assert "_oldestIdx = 0;" in body, ( + "_ensureAllMessagesLoaded must reset _oldestIdx to 0 — without it, " + "a subsequent prefetch could send `msg_before=` and " + "request older messages that are already in the now-full transcript." + ) + + +def test_ensure_all_guards_against_session_switch_mid_await(): + """Same-session check must run after await — old version skipped this.""" + body = _function_body(SESSIONS_JS, "_ensureAllMessagesLoaded") + await_idx = body.index("await api(") + sid_check_idx = body.index("S.session.session_id !== sid", await_idx) + replace_idx = body.index("S.messages = msgs;", await_idx) + assert await_idx < sid_check_idx < replace_idx, ( + "_ensureAllMessagesLoaded must guard against session-switch races " + "(re-check S.session.session_id after await) BEFORE wholesale-" + "replacing S.messages. The pre-fix version had no such guard." + )