Replace the earlier frontend-reset approach with a backend side-channel
approach that preserves the queue (event, data) tuple shape.
Problem (Opus catch):
- Live SSE frames emitted by _sse() in api/streaming.py:2296 carried no
'id:' field. Only journal-replay frames (via _sse_with_id) emitted IDs.
- Frontend's _lastRunJournalSeq cursor stayed at 0 during live streaming.
- Mid-stream error → reconnect-to-replay arrived with after_seq=0.
- Server replayed every journaled event from seq 1.
- assistantText (closure-scoped) had accumulated all live tokens already
→ double-rendered output.
Fix:
- api/config.py: STREAM_LAST_EVENT_ID: dict = {} module-level dict.
- api/streaming.py put(): capture journal event_id, write to
STREAM_LAST_EVENT_ID[stream_id]. Keep queue tuple as (event, data).
- api/routes.py _handle_sse_stream: read STREAM_LAST_EVENT_ID[stream_id]
at emit time, use _sse_with_id when set.
- api/streaming.py finally block: pop STREAM_LAST_EVENT_ID for cleanup.
Why side-channel instead of 3-tuple:
- Earlier attempt (queue tuple → (event, data, event_id)) broke 4 existing
tests: test_cancel_interrupt, test_sprint42, test_sprint51,
test_issue1857_usage_overwrite. These all unpack 'event, data = q.get()'.
- Frontend-reset approach (reset assistantText before replay) broke 3
other tests: test_smooth_text_fade, test_streaming_markdown,
test_streaming_race_fix. _wireSSE must NOT reset accumulators because
legacy reconnect doesn't replay events; only journal-replay does.
Side-channel preserves both invariants:
- Queue contract stays (event, data) — legacy consumers unbroken.
- Frontend accumulators stay alive on _wireSSE — legacy reconnect unbroken.
- Live SSE emits 'id:' so the journal cursor advances correctly.
6 regression tests added in test_stage364_opus_live_sse_event_id.py.
1 existing test (test_run_journal_streaming_static.test_streaming_journals_sse_events_before_queue_delivery) updated to be tuple-shape-agnostic.
Test results:
- Full pytest: 5713 passed, 10 skipped, 1 xfailed, 2 xpassed, 0 failed
- Previously-failing 5 tests: ALL PASS
- 6 new regression tests: ALL PASS
Replace the hardcoded Skyly cancellation wording with the configured bot_name from settings, falling back to Hermes when unset.
Keep the client-side fallback in sync by using window._botName if the session refresh after cancellation fails.
Co-authored-by: Obryn 🐉 <obryn-ai@dotbeeps.dev>
Opus stage-360 review caught that the docstring at api/streaming.py:40-43
said 'around the entire agent run' which is no longer accurate after the
narrow-lock refactor. The lock is now held only briefly for the env-mutation
critical section; the agent runs outside the lock and the finally block
re-acquires to atomically restore env vars.
Docstring now points to both narrow-lock implementations as references:
- _run_agent_streaming at line ~2719 (the original pattern)
- profile_env_for_background_worker at api/profiles.py:715 (added stage-360)
#2299 introduced profile_env_for_background_worker() in api/profiles.py and
changed _ENV_LOCK from threading.Lock() to threading.RLock(). Both changes
were incorrect:
1. RLock masked rather than fixed the underlying deadlock. The QA
test_env_lock_is_non_reentrant test exists precisely to enforce
non-reentrance — RLock would let a single thread hold _ENV_LOCK across
nested critical sections, which hides bugs while still allowing
different-thread races.
2. The original context manager held _ENV_LOCK for the ENTIRE 'yield'
duration, meaning the lock was held for the full background worker's
runtime (title generation, compression, update summary — possibly
many seconds). That blocked ALL other sessions on _ENV_LOCK, which
the QA test_third_message_completes runtime test caught as a timeout
on the third sequential message.
Fix: mirror the narrow-lock pattern from _run_agent_streaming:
- Acquire _ENV_LOCK only for env mutation (set runtime_env + patch
skill modules)
- Release immediately, yield to worker (no lock held)
- Reacquire in finally to restore env + skill modules
Restored _ENV_LOCK back to threading.Lock(). All 20 QA tests now pass,
including test_third_message_completes (was timing out, now 35s).
The FakeAgent in test_issue1857_usage_overwrite returned only 2 messages
(user + assistant) without the conversation history. The real agent always
returns the full history plus new messages. This mismatch caused the new
_has_new_assistant_reply helper (which checks only messages beyond the
pre-turn offset) to see len(result)==len(prev) and incorrectly flag the
turn as a silent failure.
Fix: prepend conversation_history to the FakeAgent's response so the
message list mirrors production behavior.
When a provider error (401/429/rate-limit) causes the agent to return
without producing a new assistant reply, the WebUI should emit an
apperror event so the user sees an inline error. However, the detection
logic scanned ALL messages in result['messages'] — which includes the
full conversation history. If any prior turn had an assistant response,
_assistant_added would be True and the apperror would be silently
skipped, leaving the user staring at a blank response.
Extract a helper _has_new_assistant_reply(all_messages, prev_count)
that only inspects messages beyond the pre-turn history offset. Apply
it to both the main detection path and the self-heal/retry path.
Tests: 15 new cases covering history masking, empty content, whitespace,
edge-case shrinks, and multi-assistant scenarios.
Opus identified that PR #2227's preservation block had two related bugs in
the parent_session_id handling:
1. During preservation save: code did
_old_parent = s.parent_session_id
s.parent_session_id = None
s.save(touch_updated_at=False, skip_index=True)
s.parent_session_id = _old_parent
The save persisted parent=None to disk. The in-memory restoration didn't
reach the disk copy. Result: a /branch fork session that subsequently
compressed lost its 'Forked from X' badge on the preserved old snapshot.
2. Stamping the continuation: code did
if not s.parent_session_id:
s.parent_session_id = old_sid
The 'if not' guard skipped the stamp when the session already had a
parent_session_id from a prior fork. Result: fork-of-fork compression
broke lineage — the continuation jumped back to the original fork parent
instead of the just-preserved immediate predecessor snapshot.
Fix (matches Opus's recommendation):
- Remove the parent clearing during preservation save (preserve as-is)
- Drop the 'if not' guard; always stamp continuation to old_sid
This makes the lineage chain consistent: new → old → old.parent → ... root.
Traversal from the continuation always walks through the just-preserved
snapshot to get to its parent's parent, never jumping over the snapshot.
Two new regression tests pin both invariants:
- test_parent_session_id_stamped_unconditionally (no 'if not' guard)
- test_old_session_parent_preserved_during_archive_save (no parent=None)
Both pass against the fix. All 8 tests in the file pass.
The previous implementation renamed old_sid.json → new_sid.json during
context compression, destroying the only persistent copy of the full
conversation history. If the summarisation LLM call also failed, the
user was left with zero recoverable messages.
Fix:
- Remove the destructive old_path.rename(new_path) call
- Preserve old_sid.json as an immutable pre-compression archive
- Create new_sid.json as a fresh file via s.save()
- Set parent_session_id on the continuation session for lineage
- Save in-memory messages to old_sid.json if they're newer than disk
Test: test_issue2223_compression_no_rename.py (6 tests, all passing)
Refs #2215 Fix B: remove the mid-response stripping hazard without losing leading multi-line wrapper cleanup.
The pattern now strips only a leading 'the user is asking' wrapper line and preserves the visible answer that follows. Add regression coverage for both the leading-wrapper and mid-response prose cases.
Opus flagged that PR #2151's cancel-handler partial-dedup loop used a
substring check that was too broad: any short prior assistant reply
('OK', 'Here is the answer:') would dedup a longer new partial containing
it, silently dropping the partial and resurrecting the #893 data-loss bug.
Tightened to only dedup against actual prior _partial=True markers with
exact (whitespace-stripped) content match. Three new regression tests
added (short-non-partial-prefix-does-not-dedup, exact-partial-match-still-
dedups, same-content-non-partial-does-not-dedup).
10/10 partial-cancel tests pass after the fix. Also updated CHANGELOG with
the conflict-resolution notes for #2151 vs #2136 and the #2178 test-fix.
fix: clarify cancelled chat turn status (Jordan-SkyLF)
Conflict resolution on api/streaming.py:4549-4567 (the cancel-handler
ownership guard). Both this PR and the already-shipped PR #2136 add a
guard at the same site against stale stream writebacks, from different
angles:
- PR #2136 (HEAD): _stream_writeback_is_current(_cs, stream_id) — strictly
dominates by checking the active_stream_id token equality.
- PR #2151: 'worker won the race' check via (active_stream_id != stream_id
and not pending_user_message), with _emit_cancel_event = False to suppress
the terminal cancel event.
Resolution merges both: keep #2136's strictly-stronger condition for skip
detection, and adopt #2151's _emit_cancel_event = False semantic so the
cancel event isn't emitted in addition to skipping the writeback (when
client may have already received the successful done payload).
55/55 tests pass across cancelled-turn-status + stale-stream-writeback +
the four cancel/data-loss sibling test files.