mirror of
https://github.com/nesquena/hermes-webui.git
synced 2026-05-25 03:00:23 +00:00
fix(#1937): close endless-scroll prefetch vs Start-jump race with generation-token + mutex
The originally-proposed fix (gate _ensureAllMessagesLoaded on the existing _loadingOlder flag) does not actually close the race. By the time the prefetch reaches its post-await body, it has already cleared the entry- gate that reads _loadingOlder, so a same-flag check inside the resolved callback would be a no-op for an in-flight request. The actual fix is two-pronged: 1. New module-scoped _messagesGeneration counter, bumped every time S.messages is wholesale-replaced. _loadOlderMessages snapshots it BEFORE its await and re-checks after — if it changed, the prepend is aborted. This is the canonical async-invalidation pattern. 2. _ensureAllMessagesLoaded now claims the _loadingOlder mutex around its body so a new prefetch cannot start mid-replace and concurrent ensure-all calls (rapid double-click on Start) serialize cleanly. It bumps the generation token before mutating S.messages, yields until any in-flight prefetch finishes, and resets _oldestIdx so a subsequent prefetch cannot request stale older messages. Also adds the same-session / _loadingSessionId guards that the original ensure-all body was missing post-await — if the user switched sessions mid-flight, the old code would happily overwrite the new session's messages with the previous session's full history. 12 new regression tests in tests/test_issue1937_endless_scroll_jumpstart_race.py lock in: generation token declaration, bump-helper presence, snapshot- before-await ordering, post-await-abort behaviour, mutex acquisition and finally-release, yield-then-claim ordering when a prefetch is in flight, generation bump during the wait phase, _oldestIdx reset, and the new session-switch guard. Closes #1937.
This commit is contained in:
+7
-1
@@ -1,5 +1,11 @@
|
||||
# Hermes Web UI -- Changelog
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
### Fixed
|
||||
|
||||
- **#1937 — Race: endless-scroll prefetch vs Start-jump's `_ensureAllMessagesLoaded` could duplicate messages.** With both `session_jump_buttons` AND `session_endless_scroll` enabled, an in-flight `_loadOlderMessages` prefetch racing with `jumpToSessionStart` → `_ensureAllMessagesLoaded` could prepend a duplicate page if the prefetch resolved last. The naive fix suggested in the report (gate ensure-all on the existing `_loadingOlder` flag) does not actually close the race — by the time the prefetch reaches its post-await body, it has already cleared the entry-gate that reads `_loadingOlder`, so a same-flag check inside the resolved callback is a no-op. The actual fix is a generation-token + mutex pair: (1) `_loadOlderMessages` snapshots a new module-scoped `_messagesGeneration` counter BEFORE its `await api(...)` and re-checks it after, aborting the prepend if any wholesale-replace bumped the token mid-flight; (2) `_ensureAllMessagesLoaded` claims the `_loadingOlder` mutex around its body (so a NEW prefetch cannot start mid-replace, and concurrent ensure-all calls from rapid double-clicks on Start serialize cleanly), bumps the generation token before mutating `S.messages`, yields until any in-flight prefetch's `finally`-block releases the mutex, and resets `_oldestIdx` so a subsequent prefetch cannot send a stale `msg_before` index. Also adds the same-session and `_loadingSessionId` guards that the original ensure-all body was missing post-await. (`static/sessions.js`, `tests/test_issue1937_endless_scroll_jumpstart_race.py` — 12 new regression tests)
|
||||
|
||||
## [v0.51.30] — 2026-05-08 — 3-PR contributor batch (Release G: offline recovery + PWA hardening + opt-in session jump buttons + opt-in endless-scroll)
|
||||
|
||||
### Added (3 PRs, all from @ai-ag2026)
|
||||
@@ -31,7 +37,7 @@
|
||||
|
||||
### Follow-up items filed (non-blocking)
|
||||
|
||||
- **Race between endless-scroll prefetch and Start-jump's `_ensureAllMessagesLoaded`** — with both opt-ins ON, an in-flight prefetch (started by 1.5x-viewport trigger) racing with `jumpToSessionStart` → `_ensureAllMessagesLoaded` could produce duplicate messages if the prefetch resolves last. Narrow window, but the fix is to gate `_ensureAllMessagesLoaded` on the existing `_loadingOlder` flag.
|
||||
- **Race between endless-scroll prefetch and Start-jump's `_ensureAllMessagesLoaded`** — with both opt-ins ON, an in-flight prefetch (started by 1.5x-viewport trigger) racing with `jumpToSessionStart` → `_ensureAllMessagesLoaded` could produce duplicate messages if the prefetch resolves last. Narrow window, but the fix is to gate `_ensureAllMessagesLoaded` on the existing `_loadingOlder` flag. **Resolved in Unreleased — see #1937 entry above; final fix uses generation-token + mutex rather than the originally-suggested flag gate, which would not have closed the race.**
|
||||
- **#1928 locale parity** — `session_jump_*` and `settings_*_session_jump_buttons` keys are English literals in ja/ru/es/de/zh/zh-Hant/pt/ko. Default-OFF + English fallback works, but breaks the locale-parity standard set by #1929 and #1891 in the same release.
|
||||
|
||||
|
||||
|
||||
+73
-9
@@ -998,6 +998,19 @@ let _loadingOlder = false;
|
||||
// oldest message currently loaded in S.messages. Starts at 0 when all
|
||||
// messages are loaded, or > 0 when truncated by msg_limit.
|
||||
let _oldestIdx = 0;
|
||||
// Generation token bumped every time S.messages is wholesale-replaced
|
||||
// (rather than incrementally extended). _loadOlderMessages snapshots it
|
||||
// before its `await` and re-checks after, so a late-resolving prefetch
|
||||
// does not prepend onto a transcript that was rebuilt under it
|
||||
// (e.g. by _ensureAllMessagesLoaded after a Start-jump). See #1937.
|
||||
let _messagesGeneration = 0;
|
||||
function _bumpMessagesGeneration() {
|
||||
// Wrap to keep the counter bounded; the only operation that matters is
|
||||
// strict inequality between the snapshot and the post-await read, so any
|
||||
// monotonic bump is sufficient.
|
||||
_messagesGeneration = (_messagesGeneration + 1) | 0;
|
||||
return _messagesGeneration;
|
||||
}
|
||||
|
||||
async function _loadOlderMessages() {
|
||||
if (_loadingOlder || !_messagesTruncated) return;
|
||||
@@ -1005,6 +1018,11 @@ async function _loadOlderMessages() {
|
||||
if (!sid || !S.messages.length) return;
|
||||
if (_oldestIdx <= 0) { _messagesTruncated = false; return; }
|
||||
_loadingOlder = true;
|
||||
// Snapshot the generation BEFORE we await. If S.messages is wholesale
|
||||
// replaced while the request is in flight, the post-await check below
|
||||
// bails out so we never prepend stale older messages onto a freshly
|
||||
// rebuilt transcript (#1937).
|
||||
const startGeneration = _messagesGeneration;
|
||||
try {
|
||||
const data = await api(`/api/session?session_id=${encodeURIComponent(sid)}&messages=1&resolve_model=0&msg_before=${_oldestIdx}&msg_limit=${_INITIAL_MSG_LIMIT}`);
|
||||
// Guard: api() may have redirected (401) and returned undefined.
|
||||
@@ -1017,6 +1035,13 @@ async function _loadOlderMessages() {
|
||||
if (!data || !data.session) return;
|
||||
if (!S.session || S.session.session_id !== sid) return;
|
||||
if (_loadingSessionId !== null && _loadingSessionId !== sid) return;
|
||||
// Generation guard: another code path (typically jumpToSessionStart →
|
||||
// _ensureAllMessagesLoaded) may have replaced S.messages while we were
|
||||
// awaiting. Prepending older messages onto that replacement would
|
||||
// duplicate the head of the transcript. Detect via the generation
|
||||
// counter and abort cleanly. _oldestIdx and _messagesTruncated were
|
||||
// already reset by the wholesale-replace path, so no rollback needed.
|
||||
if (_messagesGeneration !== startGeneration) return;
|
||||
const olderMsgs = (data.session.messages || []).filter(m => m && m.role);
|
||||
if (!olderMsgs.length) { _messagesTruncated = false; return; }
|
||||
// Prepend older messages
|
||||
@@ -1063,17 +1088,56 @@ async function _loadOlderMessages() {
|
||||
|
||||
// Ensure the full message history is loaded (for undo, export, etc).
|
||||
// If the session was loaded with msg_limit, this fetches all messages.
|
||||
//
|
||||
// Race-safety (#1937): with the endless-scroll opt-in, _loadOlderMessages
|
||||
// may be in flight when this runs (e.g. user scrolled near the top, then
|
||||
// hit the Start jump pill). Two coordinated guards prevent the prefetch
|
||||
// from prepending duplicate messages onto our wholesale replacement:
|
||||
// 1. Hold the _loadingOlder mutex around the body so a NEW prefetch
|
||||
// cannot start mid-replace (entry-gate check at line ~1003 returns
|
||||
// early). The mutex is also self-protecting against concurrent
|
||||
// ensure-all calls from rapid double-clicks on Start.
|
||||
// 2. Bump _messagesGeneration before mutating S.messages so any
|
||||
// in-flight prefetch's post-await generation check bails out.
|
||||
async function _ensureAllMessagesLoaded() {
|
||||
if (!_messagesTruncated || !S.session) return;
|
||||
const sid = S.session.session_id;
|
||||
const data = await api(`/api/session?session_id=${encodeURIComponent(sid)}&messages=1&resolve_model=0`);
|
||||
// Guard: api() may have redirected (401) and returned undefined.
|
||||
if (!data || !data.session) return;
|
||||
const msgs = (data.session.messages || []).filter(m => m && m.role);
|
||||
S.messages = msgs;
|
||||
_messagesTruncated = false;
|
||||
if(S.session && S.session.session_id === sid){
|
||||
S.session.message_count = Number(data.session.message_count || msgs.length);
|
||||
if (_loadingOlder) {
|
||||
// A prefetch is mid-flight (between the `_loadingOlder = true` line
|
||||
// and its post-await guards). Bumping the generation token now
|
||||
// poisons that prefetch's continuation, but we still need to claim
|
||||
// the mutex AFTER it releases. Yield until the prefetch finishes
|
||||
// (its finally-block clears _loadingOlder) before fetching the full
|
||||
// history ourselves. The generation bump below ensures any other
|
||||
// future race against this same continuation also fails closed.
|
||||
_bumpMessagesGeneration();
|
||||
while (_loadingOlder) {
|
||||
await new Promise(resolve => setTimeout(resolve, 16));
|
||||
}
|
||||
if (!_messagesTruncated || !S.session) return;
|
||||
}
|
||||
_loadingOlder = true;
|
||||
try {
|
||||
const sid = S.session.session_id;
|
||||
const data = await api(`/api/session?session_id=${encodeURIComponent(sid)}&messages=1&resolve_model=0`);
|
||||
// Guard: api() may have redirected (401) and returned undefined.
|
||||
if (!data || !data.session) return;
|
||||
// Session may have been switched while we awaited. Bail rather than
|
||||
// overwrite the new session's messages.
|
||||
if (!S.session || S.session.session_id !== sid) return;
|
||||
if (_loadingSessionId !== null && _loadingSessionId !== sid) return;
|
||||
const msgs = (data.session.messages || []).filter(m => m && m.role);
|
||||
// Bump the generation BEFORE the wholesale replace so any racing
|
||||
// prefetch (whose snapshot was taken before this call's mutex
|
||||
// acquisition) sees the new value and aborts.
|
||||
_bumpMessagesGeneration();
|
||||
S.messages = msgs;
|
||||
_messagesTruncated = false;
|
||||
_oldestIdx = 0;
|
||||
if (S.session && S.session.session_id === sid) {
|
||||
S.session.message_count = Number(data.session.message_count || msgs.length);
|
||||
}
|
||||
} finally {
|
||||
_loadingOlder = false;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -0,0 +1,212 @@
|
||||
"""Regression test for issue #1937 — endless-scroll prefetch vs Start-jump race.
|
||||
|
||||
When both ``session_jump_buttons`` and ``session_endless_scroll`` opt-ins
|
||||
are enabled, ``_loadOlderMessages`` (the endless-scroll prefetch) can be in
|
||||
flight when the user clicks the Start jump pill, which calls
|
||||
``_ensureAllMessagesLoaded``. If the prefetch resolves AFTER the
|
||||
ensure-all wholesale-replaces ``S.messages``, it would prepend a duplicate
|
||||
page.
|
||||
|
||||
The fix uses two coordinated guards:
|
||||
|
||||
1. A ``_messagesGeneration`` token that gets bumped any time
|
||||
``S.messages`` is wholesale-replaced. ``_loadOlderMessages`` snapshots
|
||||
the token before its ``await`` and re-checks afterwards; if it changed,
|
||||
the prepend is aborted.
|
||||
|
||||
2. ``_ensureAllMessagesLoaded`` claims the existing ``_loadingOlder``
|
||||
mutex around its body so no NEW prefetch can start mid-replace, and so
|
||||
concurrent ensure-all invocations (e.g. rapid double-click on Start)
|
||||
serialize cleanly. It also yields until any in-flight prefetch's
|
||||
``finally`` clears the flag before claiming the mutex itself.
|
||||
|
||||
The old fix shape suggested in the issue (spin-wait on ``_loadingOlder``
|
||||
before running ensure-all) does not actually solve the race the report
|
||||
describes: by the time the prefetch passes its entry-gate check, it is
|
||||
already past the only point where ``_loadingOlder`` is read, so a same-
|
||||
flag check inside its post-await body would be a no-op. The generation
|
||||
token is the canonical pattern for invalidating async continuations and
|
||||
is what this regression suite locks in.
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
REPO = Path(__file__).resolve().parents[1]
|
||||
SESSIONS_JS = (REPO / "static" / "sessions.js").read_text(encoding="utf-8")
|
||||
|
||||
|
||||
def _function_body(src: str, name: str) -> str:
|
||||
"""Slice the body of ``async function <name>`` (or ``function <name>``)."""
|
||||
needle_async = f"async function {name}"
|
||||
needle_sync = f"function {name}"
|
||||
if needle_async in src:
|
||||
start = src.index(needle_async)
|
||||
else:
|
||||
start = src.index(needle_sync)
|
||||
brace = src.index("{", start)
|
||||
depth = 0
|
||||
for i in range(brace, len(src)):
|
||||
if src[i] == "{":
|
||||
depth += 1
|
||||
elif src[i] == "}":
|
||||
depth -= 1
|
||||
if depth == 0:
|
||||
return src[start : i + 1]
|
||||
raise AssertionError(f"function {name!r} body not found")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Generation token: declared at module scope, bumped via the helper.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def test_generation_token_declared_at_module_scope():
|
||||
"""``_messagesGeneration`` exists as a module-scoped mutable counter."""
|
||||
assert "let _messagesGeneration = 0;" in SESSIONS_JS, (
|
||||
"static/sessions.js must declare `let _messagesGeneration = 0;` so "
|
||||
"_loadOlderMessages can snapshot/re-check it across its `await`. "
|
||||
"See #1937."
|
||||
)
|
||||
|
||||
|
||||
def test_generation_bump_helper_exists():
|
||||
"""A single helper bumps the generation; both consumers route through it."""
|
||||
assert "function _bumpMessagesGeneration()" in SESSIONS_JS, (
|
||||
"static/sessions.js must define `_bumpMessagesGeneration()` so "
|
||||
"wholesale-replace sites have a single, named pivot to call. See #1937."
|
||||
)
|
||||
body = _function_body(SESSIONS_JS, "_bumpMessagesGeneration")
|
||||
assert "_messagesGeneration" in body, (
|
||||
"_bumpMessagesGeneration must mutate _messagesGeneration"
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _loadOlderMessages: snapshot before await, re-check after.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def test_load_older_snapshots_generation_before_await():
|
||||
"""Snapshot must be captured BEFORE the `await api(...)` call."""
|
||||
body = _function_body(SESSIONS_JS, "_loadOlderMessages")
|
||||
snapshot_idx = body.index("const startGeneration = _messagesGeneration;")
|
||||
await_idx = body.index("await api(")
|
||||
assert snapshot_idx < await_idx, (
|
||||
"_loadOlderMessages must snapshot _messagesGeneration before its "
|
||||
"`await`. Capturing it after the await defeats the race guard. "
|
||||
"See #1937."
|
||||
)
|
||||
|
||||
|
||||
def test_load_older_aborts_when_generation_changed():
|
||||
"""Post-await guard must compare against the snapshot and abort."""
|
||||
body = _function_body(SESSIONS_JS, "_loadOlderMessages")
|
||||
assert "if (_messagesGeneration !== startGeneration) return;" in body, (
|
||||
"_loadOlderMessages must bail out (without prepending) when the "
|
||||
"generation token changed during its await — that is the signal "
|
||||
"that S.messages was wholesale-replaced under it. See #1937."
|
||||
)
|
||||
|
||||
|
||||
def test_load_older_generation_check_runs_before_prepend():
|
||||
"""Generation check must come BEFORE the `S.messages = [...older, ...]` mutation."""
|
||||
body = _function_body(SESSIONS_JS, "_loadOlderMessages")
|
||||
guard_idx = body.index("if (_messagesGeneration !== startGeneration) return;")
|
||||
prepend_idx = body.index("S.messages = [...olderMsgs, ...S.messages];")
|
||||
assert guard_idx < prepend_idx, (
|
||||
"Generation guard must short-circuit BEFORE the prepend. "
|
||||
"Otherwise duplicate messages can still slip through. See #1937."
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _ensureAllMessagesLoaded: claims the mutex, bumps the generation, yields.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def test_ensure_all_bumps_generation_before_replace():
|
||||
"""Bump must happen BEFORE `S.messages = msgs` so racing prefetch sees it."""
|
||||
body = _function_body(SESSIONS_JS, "_ensureAllMessagesLoaded")
|
||||
bump_idx = body.rindex("_bumpMessagesGeneration()")
|
||||
replace_idx = body.index("S.messages = msgs;")
|
||||
assert bump_idx < replace_idx, (
|
||||
"_ensureAllMessagesLoaded must bump the generation token BEFORE the "
|
||||
"wholesale replace, otherwise an in-flight prefetch's post-await "
|
||||
"check could read the old value and prepend duplicates. See #1937."
|
||||
)
|
||||
|
||||
|
||||
def test_ensure_all_claims_loading_older_mutex():
|
||||
"""The body must hold `_loadingOlder = true` so no NEW prefetch starts mid-replace."""
|
||||
body = _function_body(SESSIONS_JS, "_ensureAllMessagesLoaded")
|
||||
assert "_loadingOlder = true;" in body, (
|
||||
"_ensureAllMessagesLoaded must claim the _loadingOlder mutex so "
|
||||
"the entry-gate in _loadOlderMessages short-circuits new prefetches "
|
||||
"while ensure-all is mid-replace. See #1937."
|
||||
)
|
||||
assert "_loadingOlder = false;" in body, (
|
||||
"_ensureAllMessagesLoaded must release the _loadingOlder mutex in "
|
||||
"its finally-block. Otherwise endless-scroll silently breaks after "
|
||||
"every Start-jump."
|
||||
)
|
||||
|
||||
|
||||
def test_ensure_all_releases_mutex_in_finally():
|
||||
"""Mutex release must live inside a `finally` so errors don't leak the lock."""
|
||||
body = _function_body(SESSIONS_JS, "_ensureAllMessagesLoaded")
|
||||
finally_idx = body.index("} finally {")
|
||||
release_idx = body.index("_loadingOlder = false;", finally_idx)
|
||||
assert release_idx > finally_idx, (
|
||||
"_loadingOlder release must be inside the finally-block to survive "
|
||||
"thrown errors during the wholesale replace. See #1937."
|
||||
)
|
||||
|
||||
|
||||
def test_ensure_all_yields_when_prefetch_in_flight():
|
||||
"""When a prefetch holds the mutex, ensure-all must wait, not wholesale-replace alongside it."""
|
||||
body = _function_body(SESSIONS_JS, "_ensureAllMessagesLoaded")
|
||||
# Look for the yield-loop on _loadingOlder before the mutex claim.
|
||||
yield_idx = body.index("while (_loadingOlder)")
|
||||
claim_idx = body.index("_loadingOlder = true;")
|
||||
assert yield_idx < claim_idx, (
|
||||
"_ensureAllMessagesLoaded must yield (poll _loadingOlder) BEFORE "
|
||||
"claiming the mutex itself, so an in-flight prefetch's finally-"
|
||||
"block fires and the generation guard inside that prefetch resolves "
|
||||
"the race cleanly. See #1937."
|
||||
)
|
||||
|
||||
|
||||
def test_ensure_all_bumps_generation_during_wait_phase():
|
||||
"""Bumping during the wait poisons any in-flight prefetch immediately, even before ensure-all gets the mutex."""
|
||||
body = _function_body(SESSIONS_JS, "_ensureAllMessagesLoaded")
|
||||
# Find the _loadingOlder branch that runs when a prefetch is in flight,
|
||||
# and verify it bumps the generation before the wait loop.
|
||||
branch_idx = body.index("if (_loadingOlder) {")
|
||||
wait_idx = body.index("while (_loadingOlder)", branch_idx)
|
||||
bump_in_branch = body.index("_bumpMessagesGeneration()", branch_idx)
|
||||
assert branch_idx < bump_in_branch < wait_idx, (
|
||||
"When a prefetch is in flight at entry, _ensureAllMessagesLoaded "
|
||||
"must bump the generation BEFORE the wait loop so the in-flight "
|
||||
"prefetch's post-await check fires the moment its api() resolves, "
|
||||
"not just for future calls. See #1937."
|
||||
)
|
||||
|
||||
|
||||
def test_ensure_all_resets_oldest_idx():
|
||||
"""After wholesale-replacing with the full history, _oldestIdx must reset to 0."""
|
||||
body = _function_body(SESSIONS_JS, "_ensureAllMessagesLoaded")
|
||||
assert "_oldestIdx = 0;" in body, (
|
||||
"_ensureAllMessagesLoaded must reset _oldestIdx to 0 — without it, "
|
||||
"a subsequent prefetch could send `msg_before=<stale-idx>` and "
|
||||
"request older messages that are already in the now-full transcript."
|
||||
)
|
||||
|
||||
|
||||
def test_ensure_all_guards_against_session_switch_mid_await():
|
||||
"""Same-session check must run after await — old version skipped this."""
|
||||
body = _function_body(SESSIONS_JS, "_ensureAllMessagesLoaded")
|
||||
await_idx = body.index("await api(")
|
||||
sid_check_idx = body.index("S.session.session_id !== sid", await_idx)
|
||||
replace_idx = body.index("S.messages = msgs;", await_idx)
|
||||
assert await_idx < sid_check_idx < replace_idx, (
|
||||
"_ensureAllMessagesLoaded must guard against session-switch races "
|
||||
"(re-check S.session.session_id after await) BEFORE wholesale-"
|
||||
"replacing S.messages. The pre-fix version had no such guard."
|
||||
)
|
||||
Reference in New Issue
Block a user