fix(agent): don't fail a completed run when the forced "done" call errors by akeimach · Pull Request #2269 · browserbase/stagehand

akeimach · 2026-06-24T00:04:29Z

why

Edit: pulling in the description from @filip-michalsky on #2270

Root-cause fix for STG-2335: a non-CUA agent.execute() that successful
run as { success: false }, with a red Invalidprompt: messages must be a
ModelMessage[] error logged after all the work already completed.
This replaces the symptom patch (wrap the finalization in try/catch and
force state.completed = true) with a fix for the actual defect.
Root cause
After the main agent loop finishes, ensureDone() runs a forced "done"
finalization (handleDoneToolCall) that re-submits the accumulated run
history into a fresh generateText call to produce the structured
su-validates accumulated tool results, but this re-submissiondoes.
When a custom tool returns an object with an optional field left
undefined — e.g. PermitFlow's captureField returning { matchedExpected:
undefined }when no expectedText is passed — that undefined lands insid
The AI SDK's ModelMessage validation (standardizePrompt)rejects it,
because its JSON-value schema disallows undefined (only
null/string/number/boolean/object/array). The finalization throws,
flipping the result to { success: false } even though every action succe

▎ Note: the original "reasoning traces" hypothesis was rule parts come
back with a valid text: "" and pass validation.The undefined tool-result
field is the trigger.

what changed

sanitizeMessagesForResubmission() deep-strips undefined from the run
history before the forced "done" call, keeping all real content. It only
traverses plain objects/arrays, so class instances (URL, tyata, Date)
pass through untouched.

test plan

4 unit tests in agent-finalization-resilience.test.ts agaces
InvalidPromptError with an undefined tool-result field →fixed by
sanitize → real content (reasoning/tool-call/text) preserved → class
instances untouched. All pass.
End-to-end repro (openai/gpt-5.5 + custom tool, mirrors Pon main
(success=false, red error), succeeds on this branch(success=true,
completed=true, no error).

Summary by cubic

Prevents non-CUA agent.execute() from falsely failing after a successful run by sanitizing run history before the forced "done" call and making finalization best-effort. Fixes STG-2335 with openai/gpt-5.x by stripping nested undefined values that break SDK prompt validation.

Bug Fixes
- Deep-strip undefined from re-submitted messages via sanitizeMessagesForResubmission; preserve real content and class instances; use sanitizer in handleDoneToolCall and null-guard result.toolCalls.
- If the forced "done" call throws, log a warning and synthesize the completion from the run instead of failing it.
- Add unit tests covering the InvalidPromptError repro, sanitization behavior, class-instance pass-through, and finalization-failure fallback.

^{Written for commit 50d3a30. Summary will update on new commits.}

…rors Non-CUA agent.execute() ran a forced "done" finalization step that re-submits the accumulated run history to the model. With some providers (notably reasoning models like openai/gpt-5.x) that history is rejected by the AI SDK ("Invalid prompt: messages must be a ModelMessage[]"), which was logged as a red error and flipped the result to { success: false } even though every agent action had already completed successfully. Make finalization best-effort: if handleDoneToolCall throws, log a warning with the underlying cause and synthesize a completion from the run instead of failing it. Also null-guard result.toolCalls in handleDoneToolCall. Adds a regression unit test covering the finalization-failure path. Ref: Pylon #19999 / STG-2335

changeset-bot · 2026-06-24T00:04:35Z

🦋 Changeset detected

Latest commit: 50d3a30

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages

Name	Type
@browserbasehq/stagehand	Patch
@browserbasehq/stagehand-evals	Patch
@browserbasehq/stagehand-server-v3	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

cubic-dev-ai

1 issue found across 4 files

Confidence score: 3/5

In packages/core/lib/v3/handlers/v3AgentHandler.ts, the fallback path sets completed=true even when done-finalization fails, which can mislabel incomplete runs as successful and hide max-steps stop conditions from callers; gate completed on successful finalization (or surface a failed/incomplete state) before merging.

_{Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic}

…alization Root-cause fix for STG-2335. The forced "done" finalization re-submits the accumulated run history to the model. When a custom tool returns an object with an optional field left `undefined` (e.g. PermitFlow's captureField returning `{ matchedExpected: undefined }`), that `undefined` lands inside a tool-result `output.value`. The AI SDK's prompt validation rejects it — its JSON-value schema disallows `undefined` — throwing "Invalid prompt: messages must be a ModelMessage[]". That flipped a completed run to { success: false } with a red error, even though every action had already succeeded. Deep-strip `undefined` from the history before the finalization call, keeping all real content. Class instances (URL, typed arrays, Date) are passed through untouched so binary image data isn't corrupted. Replaces PR #2269's best-effort try/catch (which masked the error and forced completed=true); reverts that fallback in ensureDone and rewrites the test to cover the undefined-tool-result re-submission path. Reproduced end-to-end with openai/gpt-5.5 + a custom tool: fails on main, succeeds on this branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…l a completed run Non-CUA agent.execute() runs a forced "done" finalization step that re-submits the accumulated run history to the model. With some providers (notably reasoning models like openai/gpt-5.x) that history carries nested `undefined` values inside `providerOptions`, which the AI SDK rejects ("Invalid prompt: messages must be a ModelMessage[]" — providerOptions leaves must be JSON values). This was logged as a red error and flipped the result to { success: false } even though every agent action had already completed. - Sanitize the run history (strip nested undefined, equivalent to a JSON round-trip) before the forced "done" call, so it succeeds and structured output is preserved. - Defense-in-depth: if finalization still throws, log a warning and synthesize a completion instead of failing the run. - Null-guard result.toolCalls in handleDoneToolCall. Adds unit tests for the sanitizer and the finalization-failure fallback. Ref: Pylon #19999 / STG-2335

…2270) --- Title: fix(agent): strip undefined values from run history before "done" finalization (STG-2335) Base: alyssamaruyama/stg-2335-escalation-permitflow-agent-with-custom-tool-has-error-on --- Summary Root-cause fix for STG-2335: a non-CUA agent.execute() that successful run as { success: false }, with a red Invalidprompt: messages must be a ModelMessage[] error logged after all the work already completed. This replaces the symptom patch (wrap the finalization in try/catch and force state.completed = true) with a fix for the actual defect. Root cause After the main agent loop finishes, ensureDone() runs a forced "done" finalization (handleDoneToolCall) that re-submits the accumulated run history into a fresh generateText call to produce the structured su-validates accumulated tool results, but this re-submissiondoes. When a custom tool returns an object with an optional field left undefined — e.g. PermitFlow's captureField returning { matchedExpected: undefined }when no expectedText is passed — that undefined lands insid The AI SDK's ModelMessage validation (standardizePrompt)rejects it, because its JSON-value schema disallows undefined (only null/string/number/boolean/object/array). The finalization throws, flipping the result to { success: false } even though every action succe ▎ Note: the original "reasoning traces" hypothesis was rule parts come back with a valid text: "" and pass validation.The undefined tool-result field is the trigger. Fix sanitizeMessagesForResubmission() deep-strips undefined from the run history before the forced "done" call, keeping all real content. It only traverses plain objects/arrays, so class instances (URL, tyata, Date) pass through untouched. Also reverts PR #2269's force-completed=true fallback, so gfail loudly again instead of being masked. Testing - 4 unit tests in agent-finalization-resilience.test.ts agaces InvalidPromptError with an undefined tool-result field →fixed by sanitize → real content (reasoning/tool-call/text) preserved → class instances untouched. All pass. - End-to-end repro (openai/gpt-5.5 + custom tool, mirrors Pon main (success=false, red error), succeeds on this branch(success=true, completed=true, no error). --- Co-authored-by: Filip Michalsky <filip-michalsky@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…h-custom-tool-has-error-on' of github.com:browserbase/stagehand into alyssamaruyama/stg-2335-escalation-permitflow-agent-with-custom-tool-has-error-on

cubic-dev-ai Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread packages/core/lib/v3/handlers/v3AgentHandler.ts

miguelg719 approved these changes Jun 24, 2026

View reviewed changes

filip-michalsky mentioned this pull request Jun 24, 2026

fix(agent): strip reasoning parts before forced "done" re-submission #2270

Merged

2 tasks

akeimach and others added 4 commits June 24, 2026 08:05

Merge branch 'alyssamaruyama/stg-2335-escalation-permitflow-agent-wit…

3bf4396

…h-custom-tool-has-error-on' of github.com:browserbase/stagehand into alyssamaruyama/stg-2335-escalation-permitflow-agent-with-custom-tool-has-error-on

fix lint

50d3a30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agent): don't fail a completed run when the forced "done" call errors#2269

fix(agent): don't fail a completed run when the forced "done" call errors#2269
akeimach wants to merge 5 commits into
mainfrom
alyssamaruyama/stg-2335-escalation-permitflow-agent-with-custom-tool-has-error-on

akeimach commented Jun 24, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

akeimach commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

why

what changed

test plan

Summary by cubic

Uh oh!

changeset-bot Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akeimach commented Jun 24, 2026 •

edited

Loading

changeset-bot Bot commented Jun 24, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading