Skip to content

fix(agent): don't fail a completed run when the forced "done" call errors#2269

Open
akeimach wants to merge 5 commits into
mainfrom
alyssamaruyama/stg-2335-escalation-permitflow-agent-with-custom-tool-has-error-on
Open

fix(agent): don't fail a completed run when the forced "done" call errors#2269
akeimach wants to merge 5 commits into
mainfrom
alyssamaruyama/stg-2335-escalation-permitflow-agent-with-custom-tool-has-error-on

Conversation

@akeimach

@akeimach akeimach commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

why

Edit: pulling in the description from @filip-michalsky on #2270

Root-cause fix for STG-2335: a non-CUA agent.execute() that successful
run as { success: false }, with a red Invalidprompt: messages must be a
ModelMessage[] error logged after all the work already completed.
This replaces the symptom patch (wrap the finalization in try/catch and
force state.completed = true) with a fix for the actual defect.
Root cause
After the main agent loop finishes, ensureDone() runs a forced "done"
finalization (handleDoneToolCall) that re-submits the accumulated run
history into a fresh generateText call to produce the structured
su-validates accumulated tool results, but this re-submissiondoes.
When a custom tool returns an object with an optional field left
undefined — e.g. PermitFlow's captureField returning { matchedExpected:
undefined }when no expectedText is passed — that undefined lands insid
The AI SDK's ModelMessage validation (standardizePrompt)rejects it,
because its JSON-value schema disallows undefined (only
null/string/number/boolean/object/array). The finalization throws,
flipping the result to { success: false } even though every action succe

▎ Note: the original "reasoning traces" hypothesis was rule parts come
back with a valid text: "" and pass validation.The undefined tool-result
field is the trigger.

what changed

sanitizeMessagesForResubmission() deep-strips undefined from the run
history before the forced "done" call, keeping all real content. It only
traverses plain objects/arrays, so class instances (URL, tyata, Date)
pass through untouched.

test plan

  • 4 unit tests in agent-finalization-resilience.test.ts agaces
    InvalidPromptError with an undefined tool-result field →fixed by
    sanitize → real content (reasoning/tool-call/text) preserved → class
    instances untouched. All pass.
  • End-to-end repro (openai/gpt-5.5 + custom tool, mirrors Pon main
    (success=false, red error), succeeds on this branch(success=true,
    completed=true, no error).

Summary by cubic

Prevents non-CUA agent.execute() from falsely failing after a successful run by sanitizing run history before the forced "done" call and making finalization best-effort. Fixes STG-2335 with openai/gpt-5.x by stripping nested undefined values that break SDK prompt validation.

  • Bug Fixes
    • Deep-strip undefined from re-submitted messages via sanitizeMessagesForResubmission; preserve real content and class instances; use sanitizer in handleDoneToolCall and null-guard result.toolCalls.
    • If the forced "done" call throws, log a warning and synthesize the completion from the run instead of failing it.
    • Add unit tests covering the InvalidPromptError repro, sanitization behavior, class-instance pass-through, and finalization-failure fallback.

Written for commit 50d3a30. Summary will update on new commits.

Review in cubic

…rors

Non-CUA agent.execute() ran a forced "done" finalization step that
re-submits the accumulated run history to the model. With some providers
(notably reasoning models like openai/gpt-5.x) that history is rejected by
the AI SDK ("Invalid prompt: messages must be a ModelMessage[]"), which was
logged as a red error and flipped the result to { success: false } even
though every agent action had already completed successfully.

Make finalization best-effort: if handleDoneToolCall throws, log a warning
with the underlying cause and synthesize a completion from the run instead
of failing it. Also null-guard result.toolCalls in handleDoneToolCall.

Adds a regression unit test covering the finalization-failure path.

Ref: Pylon #19999 / STG-2335
@changeset-bot

changeset-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 50d3a30

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server-v3 Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files

Confidence score: 3/5

  • In packages/core/lib/v3/handlers/v3AgentHandler.ts, the fallback path sets completed=true even when done-finalization fails, which can mislabel incomplete runs as successful and hide max-steps stop conditions from callers; gate completed on successful finalization (or surface a failed/incomplete state) before merging.

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread packages/core/lib/v3/handlers/v3AgentHandler.ts
filip-michalsky added a commit that referenced this pull request Jun 24, 2026
…alization

Root-cause fix for STG-2335. The forced "done" finalization re-submits the
accumulated run history to the model. When a custom tool returns an object
with an optional field left `undefined` (e.g. PermitFlow's captureField
returning `{ matchedExpected: undefined }`), that `undefined` lands inside a
tool-result `output.value`. The AI SDK's prompt validation rejects it — its
JSON-value schema disallows `undefined` — throwing "Invalid prompt: messages
must be a ModelMessage[]". That flipped a completed run to { success: false }
with a red error, even though every action had already succeeded.

Deep-strip `undefined` from the history before the finalization call, keeping
all real content. Class instances (URL, typed arrays, Date) are passed through
untouched so binary image data isn't corrupted.

Replaces PR #2269's best-effort try/catch (which masked the error and forced
completed=true); reverts that fallback in ensureDone and rewrites the test to
cover the undefined-tool-result re-submission path.

Reproduced end-to-end with openai/gpt-5.5 + a custom tool: fails on main,
succeeds on this branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
akeimach and others added 4 commits June 24, 2026 08:05
…l a completed run

Non-CUA agent.execute() runs a forced "done" finalization step that
re-submits the accumulated run history to the model. With some providers
(notably reasoning models like openai/gpt-5.x) that history carries nested
`undefined` values inside `providerOptions`, which the AI SDK rejects
("Invalid prompt: messages must be a ModelMessage[]" — providerOptions leaves
must be JSON values). This was logged as a red error and flipped the result
to { success: false } even though every agent action had already completed.

- Sanitize the run history (strip nested undefined, equivalent to a JSON
  round-trip) before the forced "done" call, so it succeeds and structured
  output is preserved.
- Defense-in-depth: if finalization still throws, log a warning and
  synthesize a completion instead of failing the run.
- Null-guard result.toolCalls in handleDoneToolCall.

Adds unit tests for the sanitizer and the finalization-failure fallback.

Ref: Pylon #19999 / STG-2335
…2270)

---
Title: fix(agent): strip undefined values from run history before "done"
finalization (STG-2335)

Base:
alyssamaruyama/stg-2335-escalation-permitflow-agent-with-custom-tool-has-error-on

---
Summary

Root-cause fix for STG-2335: a non-CUA agent.execute() that successful
run as { success: false }, with a red Invalidprompt: messages must be a
ModelMessage[] error logged after all the work already completed.
This replaces the symptom patch (wrap the finalization in try/catch and
force state.completed = true) with a fix for the actual defect.
Root cause
After the main agent loop finishes, ensureDone() runs a forced "done"
finalization (handleDoneToolCall) that re-submits the accumulated run
history into a fresh generateText call to produce the structured
su-validates accumulated tool results, but this re-submissiondoes.
When a custom tool returns an object with an optional field left
undefined — e.g. PermitFlow's captureField returning { matchedExpected:
undefined }when no expectedText is passed — that undefined lands insid
The AI SDK's ModelMessage validation (standardizePrompt)rejects it,
because its JSON-value schema disallows undefined (only
null/string/number/boolean/object/array). The finalization throws,
flipping the result to { success: false } even though every action succe

▎ Note: the original "reasoning traces" hypothesis was rule parts come
back with a valid text: "" and pass validation.The undefined tool-result
field is the trigger.

Fix

sanitizeMessagesForResubmission() deep-strips undefined from the run
history before the forced "done" call, keeping all real content. It only
traverses plain objects/arrays, so class instances (URL, tyata, Date)
pass through untouched.

Also reverts PR #2269's force-completed=true fallback, so gfail loudly
again instead of being masked.

Testing

- 4 unit tests in agent-finalization-resilience.test.ts agaces
InvalidPromptError with an undefined tool-result field →fixed by
sanitize → real content (reasoning/tool-call/text) preserved → class
instances untouched. All pass.
- End-to-end repro (openai/gpt-5.5 + custom tool, mirrors Pon main
(success=false, red error), succeeds on this branch(success=true,
completed=true, no error).

---

Co-authored-by: Filip Michalsky <filip-michalsky@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…h-custom-tool-has-error-on' of github.com:browserbase/stagehand into alyssamaruyama/stg-2335-escalation-permitflow-agent-with-custom-tool-has-error-on
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants