fix(agent): strip reasoning parts before forced "done" re-submission by filip-michalsky · Pull Request #2270 · browserbase/stagehand

filip-michalsky · 2026-06-24T09:05:07Z

why

Root-cause fix for STG-2335: a non-CUA agent.execute() that uses a custom tool reports a successful run as { success: false }, with a red Invalid prompt: messages must be a ModelMessage[] error logged after all the work already completed.

After the main agent loop finishes, ensureDone() runs a forced "done" finalization (handleDoneToolCall) that re-submits the accumulated run history into a fresh generateText call. The main loop never re-validates accumulated tool results — but this re-submission does. When a custom tool returns an optional field left undefined (e.g. PermitFlow's captureField returning { matchedExpected: undefined } when no expectedText is passed), that undefined lands inside a tool-result output.value, and the AI SDK's ModelMessage validation (standardizePrompt) rejects it — its JSON-value schema disallows undefined. Finalization throws, flipping the result to { success: false } even though every action succeeded.

PR #2269 patched the symptom (wrap finalization in try/catch and force state.completed = true). This PR fixes the actual defect instead.

Note

The original "reasoning traces" hypothesis was ruled out empirically — reasoning parts come back with a valid text: "" and pass validation. The undefined tool-result field is the real trigger.

what changed

Added sanitizeMessagesForResubmission() in handleDoneToolCall.ts, which deep-strips undefined from the run history before the forced "done" call. Only plain objects/arrays are traversed, so class instances (URL, typed arrays for binary image data, Date, …) pass through untouched.
Reverted PR fix(agent): don't fail a completed run when the forced "done" call errors #2269's force-completed=true fallback in v3AgentHandler.ts, so genuine finalization failures fail loudly again instead of being masked.
Added unit coverage in agent-finalization-resilience.test.ts.
Added changeset.

test plan

Unit — 4 tests against the real ai SDK: reproduces InvalidPromptError with an undefined tool-result field → fixed by sanitizeMessagesForResubmission → real content (reasoning / tool-call / text) preserved → class instances untouched. All pass.
End-to-end — openai/gpt-5.5 + custom tool, mirrors PermitFlow script 038:

Branch Result

main ❌ success=false, red Invalid prompt error

this branch ✅ success=true, completed=true, no error

changeset-bot · 2026-06-24T09:05:13Z

⚠️ No Changeset found

Latest commit: d30883b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes changesets to release 3 packages

Name	Type
@browserbasehq/stagehand	Patch
@browserbasehq/stagehand-evals	Patch
@browserbasehq/stagehand-server-v3	Patch

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

cubic-dev-ai

No issues found across 4 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

Architecture diagram

sequenceDiagram
    participant Agent as V3AgentHandler
    participant Done as handleDoneToolCall
    participant Strip as stripReasoningParts
    participant SDK as AI SDK (generateText)
    participant Provider as Model Provider

    Note over Agent,Provider: Finalization flow when agent finishes all work (state.completed=false)

    Agent->>Done: Call handleDoneToolCall with inputMessages (run history)
    Note over Done: Forced "done" finalization re-submits history to model
    
    Done->>Strip: Strip reasoning parts from inputMessages
    Note over Strip: Filter out assistant content parts with type "reasoning"<br/>Preserve tool-call/tool-result pairing
    
    alt Messages had reasoning parts
        Strip-->>Done: Cleaned messages (reduced length, no reasoning parts)
    else Messages had no reasoning parts
        Strip-->>Done: Messages unchanged (pass-through)
    end

    Done->>SDK: generateText(model, systemPrompt, cleaned messages + userPrompt, doneTool)
    
    alt SDK validation succeeds
        SDK->>Provider: Send validated prompt
        Provider-->>SDK: generateText response
        SDK-->>Done: Done tool result (taskComplete, reasoning)
    else SDK validation fails (e.g., missing text in reasoning part)
        Note over SDK: "Invalid prompt: messages must be a ModelMessage[]"<br/>This error path is now blocked by prior stripping
    end

    Done-->>Agent: { taskComplete, reasoning, output }
    
    alt taskComplete == true
        Agent->>Agent: Mark state.completed = true
        Agent-->>Agent: Return { messages, output }
    else taskComplete == false
        Agent->>Agent: Continue agent loop
    end

    Note over Agent,Provider: Previous behavior (removed in this PR)
    Note over Agent: Removed try/catch fallback that warned and synthesized completion
    Note over Agent: No longer masks the error - strip prevents it from occurring

_{Re-trigger cubic}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d7565a93ad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-24T09:10:13Z

+      if (m.role !== "assistant" || typeof m.content === "string") return m;
+      return {
+        ...m,
+        content: m.content.filter((p) => p.type !== "reasoning"),


Preserve Anthropic thinking blocks for tool-use histories

When the main agent runs Claude 4.6+/Fable 5, buildAgentProviderOptions enables Anthropic adaptive/always-on thinking, so tool-use assistant turns can contain signed reasoning blocks. This sanitizer removes those blocks from every assistant message while leaving the tool-call/tool-result history intact; Anthropic requires thinking blocks to be resent unmodified during tool use (see their extended thinking docs), so the forced finalization request can be rejected and, with the surrounding catch removed, the completed agent run is reported as failed. Limit this stripping to the OpenAI/malformed reasoning case or preserve Anthropic signed reasoning blocks.

Useful? React with 👍 / 👎.

…alization Root-cause fix for STG-2335. The forced "done" finalization re-submits the accumulated run history to the model. When a custom tool returns an object with an optional field left `undefined` (e.g. PermitFlow's captureField returning `{ matchedExpected: undefined }`), that `undefined` lands inside a tool-result `output.value`. The AI SDK's prompt validation rejects it — its JSON-value schema disallows `undefined` — throwing "Invalid prompt: messages must be a ModelMessage[]". That flipped a completed run to { success: false } with a red error, even though every action had already succeeded. Deep-strip `undefined` from the history before the finalization call, keeping all real content. Class instances (URL, typed arrays, Date) are passed through untouched so binary image data isn't corrupted. Replaces PR #2269's best-effort try/catch (which masked the error and forced completed=true); reverts that fallback in ensureDone and rewrites the test to cover the undefined-tool-result re-submission path. Reproduced end-to-end with openai/gpt-5.5 + a custom tool: fails on main, succeeds on this branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

akeimach

fabulous

cubic-dev-ai Bot reviewed Jun 24, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 24, 2026

View reviewed changes

filip-michalsky force-pushed the fm/stg-2335-strip-reasoning-done-finalization branch from d7565a9 to d30883b Compare June 24, 2026 09:33

akeimach approved these changes Jun 24, 2026

View reviewed changes

filip-michalsky merged commit 22bc215 into alyssamaruyama/stg-2335-escalation-permitflow-agent-with-custom-tool-has-error-on Jun 24, 2026
115 of 218 checks passed

akeimach mentioned this pull request Jun 25, 2026

fix(agent): don't fail a completed run when the forced "done" call errors #2269

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agent): strip reasoning parts before forced "done" re-submission#2270

fix(agent): strip reasoning parts before forced "done" re-submission#2270
filip-michalsky merged 1 commit into
alyssamaruyama/stg-2335-escalation-permitflow-agent-with-custom-tool-has-error-onfrom
fm/stg-2335-strip-reasoning-done-finalization

filip-michalsky commented Jun 24, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 24, 2026

Uh oh!

akeimach left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Branch	Result
`main`	❌ `success=false`, red `Invalid prompt` error
this branch	✅ `success=true`, `completed=true`, no error

Uh oh!

Conversation

filip-michalsky commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

why

what changed

test plan

Uh oh!

changeset-bot Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

akeimach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

filip-michalsky commented Jun 24, 2026 •

edited

Loading

changeset-bot Bot commented Jun 24, 2026 •

edited

Loading