runtime: recover oversized user message after wire/media overflow#2821
runtime: recover oversized user message after wire/media overflow#2821trungutt wants to merge 2 commits into
Conversation
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
2 medium-severity issues found in the overflow recovery implementation.
a1d06ad to
18a703a
Compare
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
Two medium-severity findings in the new overflow-recovery code. The core approach — scrubbing oversized messages in-place, persisting the rewrite, and emitting a Warning event — is sound and the two learned patterns (wire/media scrub independence from compaction config; top-level Content vs MultiContent text scrubbing) are both correctly implemented.
b22d0fb to
c724506
Compare
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
Reviewed the overflow recovery changes across pkg/runtime/overflow_recovery.go, pkg/runtime/loop_steps.go, pkg/session/session.go, and the accompanying tests.
What was checked:
- Correctness of
findPersistedMessagewhen sub-sessions are present —GetAllMessages()appends top-level messages before sub-session messages, so even in a hypothetical ID-collision scenario the first match is always the correctly-rewritten top-level message ✅ - Notification ordering (
emitScrubNoticefires unconditionally after in-memory rewrite, regardless of persistence outcome) — correctly implemented per design intent ✅ UpdateMessagecall site — called with the post-mutation message content,messageID == 0guard handles un-persisted synthesised messages ✅- Store-write failure path — non-fatal, logged with
slog.WarnContext, and user is still notified via the scrub notice ✅ - Test coverage — happy path, persist-fails path, and no-user-message path all covered ✅
No confirmed or likely bugs found. The implementation correctly addresses the session-poisoning scenario described in the PR description.
Auto-compaction is only useful when the rejection is a token-count overflow — summarising older turns reduces the input token count. For wire-level overflow ([OverflowKindWire]) the request body itself exceeds the provider's cap, and the latest turn alone is over the limit; the compaction call would have to send the same oversized history and would also be rejected. For media overflow ([OverflowKindMedia]) we have no media-stripping during compaction today, so a retry would resend the same attachment and fail again. In both cases the recovery attempt always fails, then we surface the error anyway, while having spent an extra provider call and several seconds of wall-clock latency. This change skips compaction for those two kinds and surfaces the error directly. The token-overflow path is unchanged.
c724506 to
b334412
Compare
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🔴 CRITICAL
1 high-severity and 1 low-severity finding in the new overflow recovery code.
| // same object that was just mutated in place — race-safe because the | ||
| // caller mutates and reads under the same goroutine within | ||
| // handleStreamError. | ||
| func findPersistedMessage(sess *session.Session, id int64) *session.Message { |
There was a problem hiding this comment.
[HIGH] findPersistedMessage may always return nil — SQLite-backed sessions never get the in-memory Message.ID populated
findPersistedMessage searches sess.GetAllMessages() for a message whose ID matches the one returned by RewriteLatestUserMessage. This works when Message.ID is populated in the in-memory session, but it is not for SQLite-backed sessions:
SQLiteSessionStore.AddMessageinserts the row and returnsLastInsertId(), but it does not callsession.AddMessageitself — the runtime'sPersistenceObservercallsstore.AddMessageand discards the returned ID without writing it back into the in-memorysession.Message.ID.loadSessionItems(the path used on reload from disk) reconstructsMessagestructs without scanning theidcolumn, soMessage.IDstays0for every row loaded from SQLite.
Consequence: For any SQLite-backed session, item.Message.ID is 0 in memory. RewriteLatestUserMessage therefore returns (0, true). The guard if messageID == 0 || r.sessionStore == nil fires, emitScrubNotice is still called (good), but findPersistedMessage / UpdateMessage are skipped — so the SQLite row is never updated. On the next session load the original oversized payload reloads from disk and the overflow trips again.
The existing test (TestRecoverFromOversizedTurn_NotifiesUserEvenWhenPersistFails) explicitly sets stored.ID = persistedID before sess.AddMessage(stored) to hit the UpdateMessage path — this pattern is not followed in production code paths (e.g. PersistenceObserver.UserMessageEvent).
Fix options:
- Have
SQLiteSessionStore.AddMessagecallmsg.ID = idon the caller's pointer (mirroringInMemorySessionStore), or - Have
loadSessionItemsscan and populate theidcolumn intoMessage.ID, or - Rewrite
findPersistedMessageto useRewriteLatestUserMessage's returned ID directly (skipping the search entirely), sinceRewriteLatestUserMessagealready has the pointer and can expose the rewritten message object without a second scan.
There was a problem hiding this comment.
Verified the claim — the bug reproduces in production for both store types because the persistence ID is never written back to the in-memory Message.ID (the observer discards the return value from Store.AddMessage), and even on reload from disk loadSessionItems doesn't SELECT id, so Message.ID stays 0 forever.
Fixed in the latest force-push with two surgical changes. Both are needed; either alone leaves a gap (writes propagate via PersistenceObserver, reads via loadSessionItems):
pkg/session/store.go—loadSessionItems: the SELECT now includesidand populatesMessage.IDon the loaded row. Pinned byTestSQLiteSessionStore_LoadSessionItems_PopulatesMessageID.pkg/runtime/persistence_observer.go—UserMessageEvent: afterStore.AddMessagereturns, the observer calls a newSession.AssignLatestUserMessageID(id)so the in-memory message picks up the persistence ID. Pinned byTestPersistenceObserver_UserMessageEvent_PropagatesID(SQLite-backed, mirrors the production runtime+store flow) andTestPersistenceObserver_UserMessageEvent_AddMessageFailure_DoesNotAssignIDfor the failure path.pkg/session/session.go—AssignLatestUserMessageID: new helper used by the observer. Race-safe (unders.mu), refuses to overwrite a non-zero ID so two concurrent writers can't clobber each other. Pinned byTestSession_AssignLatestUserMessageID(covers happy path, zero-ID no-op, overwrite refusal, no-user-message, empty session).
End-to-end coverage: TestRecoverFromOversizedTurn_EndToEndPersistsScrub walks the full pipeline (PersistenceObserver writes via SQLite, scrub fires, reload from DB) and asserts the reloaded row reflects the scrubbed content with the same persistence ID — which is the exact regression the reviewer described.
Left findPersistedMessage and the existing scrub structure intact since the now-populated Message.ID makes it work correctly without further refactoring. The reviewer's option 3 (return the rewritten message pointer directly) is a code-simplification that can land as a follow-up if desired.
There was a problem hiding this comment.
Excellent fix — verified the two-pronged approach covers both write (observer propagates ID back to session) and read (loadSessionItems scans the id column on reload) paths, and the test coverage pins all entry points and failure modes as described.
The decision to preserve findPersistedMessage rather than refactor to option 3 is the right call for this PR: the now-populated Message.ID makes the search work correctly, and the "return the pointer directly" simplification can land as a follow-up without blocking the regression fix. Keeping the scope surgical reduces rebase risk and makes the history easier to bisect later.
One thing I'll note for the future: AssignLatestUserMessageID refuses to overwrite a non-zero ID (the "two concurrent writers can't clobber each other" guard), which is great for safety, but that means if two observers ever race on the same AddMessage return value, one silently loses. Not a problem in the current single-observer setup, but worth remembering if the observer wiring changes. The silent false return is appropriate for now — logging would just be noise.
Marking resolved. Thanks for the thorough fix and test coverage!
| // | ||
| // Returns the database ID of the rewritten Item.Message (used by | ||
| // callers to persist the rewrite via [Store.UpdateMessage]) together | ||
| // with ok=true. Returns (0, false) when the session has no user |
There was a problem hiding this comment.
[LOW] Docstring contradicts implementation: RewriteLatestUserMessage returns (0, true) for unpersisted messages, not (0, false)
The doc comment says:
Returns (0, false) when … the latest user message has not been persisted yet (ID is zero, e.g. a synthesised implicit message).
But the implementation returns (item.Message.ID, true) unconditionally after a successful rewrite — which is (0, true) when ID == 0.
The current caller (recoverFromOversizedTurn) handles this correctly via the if messageID == 0 guard. However, a future caller who trusts the docstring might check only ok == false to detect "nothing happened" and miss the case where an unpersisted message was rewritten in memory.
The docstring should be corrected to match the actual contract: (0, true) when the message was rewritten but had no persisted ID.
There was a problem hiding this comment.
Good catch — the docstring did claim (0, false) for unpersisted messages while the implementation returned (0, true). Fixed in the latest force-push: the doc now says "Returns the database ID of the rewritten Item.Message and ok=true on a successful rewrite. The returned ID is zero when the rewritten message has not been persisted yet ...; callers that intend to mirror the rewrite to the store via Store.UpdateMessage must check for a non-zero ID before doing so. Returns (0, false) when the session has no user messages or when rewrite returns false."
Note: with the related fix in this same force-push (PersistenceObserver now propagates the persistence ID back to the in-memory session, loadSessionItems now reads the id column), the ID == 0 case shrinks to genuinely-unpersisted messages — the production runtime/store flow no longer hits it for real user messages, which was the regression the other reviewer flagged. The docstring contract is now both correct and unambiguous about the rare remaining case.
There was a problem hiding this comment.
Excellent fix! The docstring now accurately describes the contract: (0, true) when the message was successfully rewritten but had no persisted ID, and (0, false) only when nothing happened (no user messages or rewrite declined).
Thanks for the context on the broader fix — propagating persistence IDs back from the observer to the in-memory session means the ID == 0 edge case is now genuinely rare (only unpersisted synthesized messages), which makes the API much clearer for future callers.
When the provider rejects a request because the body itself is over the wire-size cap or contains an oversized attachment, the offending user message stays verbatim in the session. Every subsequent call reloads that message as part of the conversation history and trips the same limit. The session is effectively dead until the user starts over. Add a hygiene step that runs on wire- and media-overflow rejections: walk back to the latest user message, replace each media part (image, file, document) with a text placeholder that records what was attached, and replace plain-text content over 1 MiB with a size-noting placeholder. The rewrite is mirrored to the session store so the next session load reflects it; the in-memory mutation alone keeps the current process healthy even if the store write fails. A Warning event is emitted so the UI can tell the user that their previous message was rewritten in place. The fatal ErrorEvent for the original rejection is still emitted — scrubbing is in addition to surfacing the error, not instead of it. Token-overflow is unchanged: it still goes through auto-compaction, which is the correct mechanism for that shape of failure.
b334412 to
354f333
Compare
Stacked on #2819. Review after that one merges.
After #2818 + #2819, an oversized message produces the right error message and fails fast — but the session is still poisoned. The offending user message sits verbatim in the session store, every subsequent call reloads it as part of the conversation history, and the same limit trips again. The user has to start a new chat to recover.
The end result for the user: "I shortened my paste but it kept failing."
What this change does
After a wire- or media-overflow rejection, walk back to the most recent user message and rewrite it in place:
The rewrite is then persisted to the session store, so on the next reload the conversation history no longer carries the payload that just got rejected. The error itself is still emitted with the same kind-specific code from #2818, and a Warning event explains to the user that the previous message was rewritten — so the hygiene action is observable rather than silent.
Schemas