Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .changeset/curvy-pillows-attack.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
"@browserbasehq/stagehand": patch
---

Fix non-CUA `agent.execute()` reporting a successful run as failed. After the agent finished all of its work, the forced "done" finalization step (`handleDoneToolCall`) re-submitted the accumulated run history to the model; with some providers (e.g. reasoning models like `openai/gpt-5.x`) that history is rejected by the AI SDK (`Invalid prompt: messages must be a ModelMessage[]`), which surfaced as a red error and flipped the result to `{ success: false }` even though every action had already completed.
Fix non-CUA `agent.execute()` reporting a successful run as failed. After the agent finished all of its work, the forced "done" finalization step (`handleDoneToolCall`) re-submitted the accumulated run history to the model. When a custom tool returned an object with an optional field left `undefined` (e.g. `{ matchedExpected: undefined }`), that `undefined` ended up inside a tool-result `output.value`, which the AI SDK's prompt validation rejects (its JSON-value schema disallows `undefined`), throwing `Invalid prompt: messages must be a ModelMessage[]`. This surfaced as a red error and flipped the result to `{ success: false }` even though every action had already completed (STG-2335).

The finalization "done" call is now best-effort: if it throws, the agent logs a warning (with the underlying cause) and synthesizes a completion from the run instead of failing it. Also hardens `handleDoneToolCall` against a missing `toolCalls` array.
Root cause fix: deep-strip `undefined` values from the run history before re-submitting it to the forced "done" finalization call, keeping the messages valid without dropping any real content. Class instances (URL, typed arrays for binary data, etc.) are passed through untouched.
42 changes: 41 additions & 1 deletion packages/core/lib/v3/agent/utils/handleDoneToolCall.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,46 @@ interface DoneResult {
output?: Record<string, unknown>;
}

/**
* Deep-remove `undefined` values from the run history before it is re-submitted
* to the forced "done" call.
*
* The AI SDK validates re-submitted messages against its `ModelMessage` schema,
* whose JSON-value schema rejects `undefined` (only null/string/number/boolean/
* object/array are allowed). A custom tool that returns an object with an
* optional field left `undefined` (e.g. `{ matchedExpected: undefined }`) lands
* that `undefined` inside a tool-result `output.value`, and reasoning models can
* leave `undefined` in part fields too. Either makes the forced "done" call
* throw `Invalid prompt: messages must be a ModelMessage[]` (STG-2335) — a red
* error that fires after the run has already completed. Stripping `undefined`
* keeps the history valid without dropping any real content.
*
* Only plain objects and arrays are traversed; class instances (URL, typed
* arrays for binary image data, Date, …) are passed through untouched.
*/
function stripUndefinedDeep<T>(value: T): T {
if (Array.isArray(value)) {
return value.map((v) => stripUndefinedDeep(v)) as unknown as T;
}
if (value !== null && typeof value === "object") {
const proto = Object.getPrototypeOf(value);
if (proto === Object.prototype || proto === null) {
const out: Record<string, unknown> = {};
for (const [k, v] of Object.entries(value as Record<string, unknown>)) {
if (v !== undefined) out[k] = stripUndefinedDeep(v);
}
return out as T;
}
}
return value;
}

export function sanitizeMessagesForResubmission(
messages: ModelMessage[],
): ModelMessage[] {
return stripUndefinedDeep(messages);
}

function buildBaseDoneSchema(factory: typeof z) {
return factory.object({
reasoning: factory
Expand Down Expand Up @@ -114,7 +154,7 @@ Call the "done" tool with:
const result = await generateText({
model,
system: systemPrompt,
messages: [...inputMessages, userPrompt],
messages: [...sanitizeMessagesForResubmission(inputMessages), userPrompt],
tools: { done: doneTool } as ToolSet,
toolChoice: rejectsForcedToolUse(modelId)
? "auto"
Expand Down
34 changes: 7 additions & 27 deletions packages/core/lib/v3/handlers/v3AgentHandler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -831,33 +831,13 @@ export class V3AgentHandler {
): Promise<{ messages: ModelMessage[]; output?: Record<string, unknown> }> {
if (state.completed) return { messages };

let doneResult: Awaited<ReturnType<typeof handleDoneToolCall>>;
try {
doneResult = await handleDoneToolCall({
model,
inputMessages: messages,
instruction,
outputSchema,
logger,
});
} catch (error) {
// The forced "done" call only summarizes the run, so its failure must not
// fail a run whose work already completed (e.g. a provider rejecting the
// re-submitted history). Warn and synthesize a completion. We log only the
// message, not the cause — the cause embeds the full history (base64
// images included) and would bloat the log.
logger?.({
category: "agent",
level: 1,
message: `Agent "done" finalization call failed; using run summary instead: ${getErrorMessage(error)}`,
});
state.completed = true;
state.finalMessage =
state.finalMessage ||
state.collectedReasoning.join(" ").trim() ||
"Task execution completed";
return { messages };
}
const doneResult = await handleDoneToolCall({
model,
inputMessages: messages,
instruction,
outputSchema,
logger,
});

state.completed = doneResult.taskComplete;
state.finalMessage = doneResult.reasoning;
Expand Down
Loading
Loading