Skip to content

fix(agent): strip reasoning parts before forced "done" re-submission#2270

Merged
filip-michalsky merged 1 commit into
alyssamaruyama/stg-2335-escalation-permitflow-agent-with-custom-tool-has-error-onfrom
fm/stg-2335-strip-reasoning-done-finalization
Jun 24, 2026
Merged

fix(agent): strip reasoning parts before forced "done" re-submission#2270
filip-michalsky merged 1 commit into
alyssamaruyama/stg-2335-escalation-permitflow-agent-with-custom-tool-has-error-onfrom
fm/stg-2335-strip-reasoning-done-finalization

Conversation

@filip-michalsky

@filip-michalsky filip-michalsky commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

why

Root-cause fix for STG-2335: a non-CUA agent.execute() that uses a custom tool reports a successful run as { success: false }, with a red Invalid prompt: messages must be a ModelMessage[] error logged after all the work already completed.

After the main agent loop finishes, ensureDone() runs a forced "done" finalization (handleDoneToolCall) that re-submits the accumulated run history into a fresh generateText call. The main loop never re-validates accumulated tool results — but this re-submission does. When a custom tool returns an optional field left undefined (e.g. PermitFlow's captureField returning { matchedExpected: undefined } when no expectedText is passed), that undefined lands inside a tool-result output.value, and the AI SDK's ModelMessage validation (standardizePrompt) rejects it — its JSON-value schema disallows undefined. Finalization throws, flipping the result to { success: false } even though every action succeeded.

PR #2269 patched the symptom (wrap finalization in try/catch and force state.completed = true). This PR fixes the actual defect instead.

Note

The original "reasoning traces" hypothesis was ruled out empirically — reasoning parts come back with a valid text: "" and pass validation. The undefined tool-result field is the real trigger.

what changed

  • Added sanitizeMessagesForResubmission() in handleDoneToolCall.ts, which deep-strips undefined from the run history before the forced "done" call. Only plain objects/arrays are traversed, so class instances (URL, typed arrays for binary image data, Date, …) pass through untouched.
  • Reverted PR fix(agent): don't fail a completed run when the forced "done" call errors #2269's force-completed=true fallback in v3AgentHandler.ts, so genuine finalization failures fail loudly again instead of being masked.
  • Added unit coverage in agent-finalization-resilience.test.ts.
  • Added changeset.

test plan

  • Unit — 4 tests against the real ai SDK: reproduces InvalidPromptError with an undefined tool-result field → fixed by sanitizeMessagesForResubmission → real content (reasoning / tool-call / text) preserved → class instances untouched. All pass.

  • End-to-end — openai/gpt-5.5 + custom tool, mirrors PermitFlow script 038:

    Branch Result
    main success=false, red Invalid prompt error
    this branch success=true, completed=true, no error

@changeset-bot

changeset-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: d30883b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes changesets to release 3 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server-v3 Patch

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant Agent as V3AgentHandler
    participant Done as handleDoneToolCall
    participant Strip as stripReasoningParts
    participant SDK as AI SDK (generateText)
    participant Provider as Model Provider

    Note over Agent,Provider: Finalization flow when agent finishes all work (state.completed=false)

    Agent->>Done: Call handleDoneToolCall with inputMessages (run history)
    Note over Done: Forced "done" finalization re-submits history to model
    
    Done->>Strip: Strip reasoning parts from inputMessages
    Note over Strip: Filter out assistant content parts with type "reasoning"<br/>Preserve tool-call/tool-result pairing
    
    alt Messages had reasoning parts
        Strip-->>Done: Cleaned messages (reduced length, no reasoning parts)
    else Messages had no reasoning parts
        Strip-->>Done: Messages unchanged (pass-through)
    end

    Done->>SDK: generateText(model, systemPrompt, cleaned messages + userPrompt, doneTool)
    
    alt SDK validation succeeds
        SDK->>Provider: Send validated prompt
        Provider-->>SDK: generateText response
        SDK-->>Done: Done tool result (taskComplete, reasoning)
    else SDK validation fails (e.g., missing text in reasoning part)
        Note over SDK: "Invalid prompt: messages must be a ModelMessage[]"<br/>This error path is now blocked by prior stripping
    end

    Done-->>Agent: { taskComplete, reasoning, output }
    
    alt taskComplete == true
        Agent->>Agent: Mark state.completed = true
        Agent-->>Agent: Return { messages, output }
    else taskComplete == false
        Agent->>Agent: Continue agent loop
    end

    Note over Agent,Provider: Previous behavior (removed in this PR)
    Note over Agent: Removed try/catch fallback that warned and synthesized completion
    Note over Agent: No longer masks the error - strip prevents it from occurring
Loading

Re-trigger cubic

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d7565a93ad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if (m.role !== "assistant" || typeof m.content === "string") return m;
return {
...m,
content: m.content.filter((p) => p.type !== "reasoning"),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve Anthropic thinking blocks for tool-use histories

When the main agent runs Claude 4.6+/Fable 5, buildAgentProviderOptions enables Anthropic adaptive/always-on thinking, so tool-use assistant turns can contain signed reasoning blocks. This sanitizer removes those blocks from every assistant message while leaving the tool-call/tool-result history intact; Anthropic requires thinking blocks to be resent unmodified during tool use (see their extended thinking docs), so the forced finalization request can be rejected and, with the surrounding catch removed, the completed agent run is reported as failed. Limit this stripping to the OpenAI/malformed reasoning case or preserve Anthropic signed reasoning blocks.

Useful? React with 👍 / 👎.

…alization

Root-cause fix for STG-2335. The forced "done" finalization re-submits the
accumulated run history to the model. When a custom tool returns an object
with an optional field left `undefined` (e.g. PermitFlow's captureField
returning `{ matchedExpected: undefined }`), that `undefined` lands inside a
tool-result `output.value`. The AI SDK's prompt validation rejects it — its
JSON-value schema disallows `undefined` — throwing "Invalid prompt: messages
must be a ModelMessage[]". That flipped a completed run to { success: false }
with a red error, even though every action had already succeeded.

Deep-strip `undefined` from the history before the finalization call, keeping
all real content. Class instances (URL, typed arrays, Date) are passed through
untouched so binary image data isn't corrupted.

Replaces PR #2269's best-effort try/catch (which masked the error and forced
completed=true); reverts that fallback in ensureDone and rewrites the test to
cover the undefined-tool-result re-submission path.

Reproduced end-to-end with openai/gpt-5.5 + a custom tool: fails on main,
succeeds on this branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@filip-michalsky filip-michalsky force-pushed the fm/stg-2335-strip-reasoning-done-finalization branch from d7565a9 to d30883b Compare June 24, 2026 09:33

@akeimach akeimach left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fabulous

@filip-michalsky filip-michalsky merged commit 22bc215 into alyssamaruyama/stg-2335-escalation-permitflow-agent-with-custom-tool-has-error-on Jun 24, 2026
115 of 218 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants