fix(agent): don't fail a completed run when the forced "done" call errors#2269
Open
akeimach wants to merge 5 commits into
Open
Conversation
…rors
Non-CUA agent.execute() ran a forced "done" finalization step that
re-submits the accumulated run history to the model. With some providers
(notably reasoning models like openai/gpt-5.x) that history is rejected by
the AI SDK ("Invalid prompt: messages must be a ModelMessage[]"), which was
logged as a red error and flipped the result to { success: false } even
though every agent action had already completed successfully.
Make finalization best-effort: if handleDoneToolCall throws, log a warning
with the underlying cause and synthesize a completion from the run instead
of failing it. Also null-guard result.toolCalls in handleDoneToolCall.
Adds a regression unit test covering the finalization-failure path.
Ref: Pylon #19999 / STG-2335
🦋 Changeset detectedLatest commit: 50d3a30 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Contributor
There was a problem hiding this comment.
1 issue found across 4 files
Confidence score: 3/5
- In
packages/core/lib/v3/handlers/v3AgentHandler.ts, the fallback path setscompleted=trueeven when done-finalization fails, which can mislabel incomplete runs as successful and hide max-steps stop conditions from callers; gatecompletedon successful finalization (or surface a failed/incomplete state) before merging.
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
miguelg719
approved these changes
Jun 24, 2026
filip-michalsky
added a commit
that referenced
this pull request
Jun 24, 2026
…alization
Root-cause fix for STG-2335. The forced "done" finalization re-submits the
accumulated run history to the model. When a custom tool returns an object
with an optional field left `undefined` (e.g. PermitFlow's captureField
returning `{ matchedExpected: undefined }`), that `undefined` lands inside a
tool-result `output.value`. The AI SDK's prompt validation rejects it — its
JSON-value schema disallows `undefined` — throwing "Invalid prompt: messages
must be a ModelMessage[]". That flipped a completed run to { success: false }
with a red error, even though every action had already succeeded.
Deep-strip `undefined` from the history before the finalization call, keeping
all real content. Class instances (URL, typed arrays, Date) are passed through
untouched so binary image data isn't corrupted.
Replaces PR #2269's best-effort try/catch (which masked the error and forced
completed=true); reverts that fallback in ensureDone and rewrites the test to
cover the undefined-tool-result re-submission path.
Reproduced end-to-end with openai/gpt-5.5 + a custom tool: fails on main,
succeeds on this branch.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2 tasks
…l a completed run
Non-CUA agent.execute() runs a forced "done" finalization step that
re-submits the accumulated run history to the model. With some providers
(notably reasoning models like openai/gpt-5.x) that history carries nested
`undefined` values inside `providerOptions`, which the AI SDK rejects
("Invalid prompt: messages must be a ModelMessage[]" — providerOptions leaves
must be JSON values). This was logged as a red error and flipped the result
to { success: false } even though every agent action had already completed.
- Sanitize the run history (strip nested undefined, equivalent to a JSON
round-trip) before the forced "done" call, so it succeeds and structured
output is preserved.
- Defense-in-depth: if finalization still throws, log a warning and
synthesize a completion instead of failing the run.
- Null-guard result.toolCalls in handleDoneToolCall.
Adds unit tests for the sanitizer and the finalization-failure fallback.
Ref: Pylon #19999 / STG-2335
…2270) --- Title: fix(agent): strip undefined values from run history before "done" finalization (STG-2335) Base: alyssamaruyama/stg-2335-escalation-permitflow-agent-with-custom-tool-has-error-on --- Summary Root-cause fix for STG-2335: a non-CUA agent.execute() that successful run as { success: false }, with a red Invalidprompt: messages must be a ModelMessage[] error logged after all the work already completed. This replaces the symptom patch (wrap the finalization in try/catch and force state.completed = true) with a fix for the actual defect. Root cause After the main agent loop finishes, ensureDone() runs a forced "done" finalization (handleDoneToolCall) that re-submits the accumulated run history into a fresh generateText call to produce the structured su-validates accumulated tool results, but this re-submissiondoes. When a custom tool returns an object with an optional field left undefined — e.g. PermitFlow's captureField returning { matchedExpected: undefined }when no expectedText is passed — that undefined lands insid The AI SDK's ModelMessage validation (standardizePrompt)rejects it, because its JSON-value schema disallows undefined (only null/string/number/boolean/object/array). The finalization throws, flipping the result to { success: false } even though every action succe ▎ Note: the original "reasoning traces" hypothesis was rule parts come back with a valid text: "" and pass validation.The undefined tool-result field is the trigger. Fix sanitizeMessagesForResubmission() deep-strips undefined from the run history before the forced "done" call, keeping all real content. It only traverses plain objects/arrays, so class instances (URL, tyata, Date) pass through untouched. Also reverts PR #2269's force-completed=true fallback, so gfail loudly again instead of being masked. Testing - 4 unit tests in agent-finalization-resilience.test.ts agaces InvalidPromptError with an undefined tool-result field →fixed by sanitize → real content (reasoning/tool-call/text) preserved → class instances untouched. All pass. - End-to-end repro (openai/gpt-5.5 + custom tool, mirrors Pon main (success=false, red error), succeeds on this branch(success=true, completed=true, no error). --- Co-authored-by: Filip Michalsky <filip-michalsky@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…h-custom-tool-has-error-on' of github.com:browserbase/stagehand into alyssamaruyama/stg-2335-escalation-permitflow-agent-with-custom-tool-has-error-on
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
why
Edit: pulling in the description from @filip-michalsky on #2270
Root-cause fix for STG-2335: a non-CUA agent.execute() that successful
run as { success: false }, with a red Invalidprompt: messages must be a
ModelMessage[] error logged after all the work already completed.
This replaces the symptom patch (wrap the finalization in try/catch and
force state.completed = true) with a fix for the actual defect.
Root cause
After the main agent loop finishes, ensureDone() runs a forced "done"
finalization (handleDoneToolCall) that re-submits the accumulated run
history into a fresh generateText call to produce the structured
su-validates accumulated tool results, but this re-submissiondoes.
When a custom tool returns an object with an optional field left
undefined — e.g. PermitFlow's captureField returning { matchedExpected:
undefined }when no expectedText is passed — that undefined lands insid
The AI SDK's ModelMessage validation (standardizePrompt)rejects it,
because its JSON-value schema disallows undefined (only
null/string/number/boolean/object/array). The finalization throws,
flipping the result to { success: false } even though every action succe
▎ Note: the original "reasoning traces" hypothesis was rule parts come
back with a valid text: "" and pass validation.The undefined tool-result
field is the trigger.
what changed
sanitizeMessagesForResubmission() deep-strips undefined from the run
history before the forced "done" call, keeping all real content. It only
traverses plain objects/arrays, so class instances (URL, tyata, Date)
pass through untouched.
test plan
InvalidPromptError with an undefined tool-result field →fixed by
sanitize → real content (reasoning/tool-call/text) preserved → class
instances untouched. All pass.
(success=false, red error), succeeds on this branch(success=true,
completed=true, no error).
Summary by cubic
Prevents non-CUA
agent.execute()from falsely failing after a successful run by sanitizing run history before the forced "done" call and making finalization best-effort. Fixes STG-2335 withopenai/gpt-5.xby stripping nestedundefinedvalues that break SDK prompt validation.undefinedfrom re-submitted messages viasanitizeMessagesForResubmission; preserve real content and class instances; use sanitizer inhandleDoneToolCalland null-guardresult.toolCalls.Written for commit 50d3a30. Summary will update on new commits.