Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changeset/curvy-pillows-attack.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
"@browserbasehq/stagehand": patch
---

Fix non-CUA `agent.execute()` reporting a successful run as failed. After the agent finished all of its work, the forced "done" finalization step (`handleDoneToolCall`) re-submitted the accumulated run history to the model. When a custom tool returned an object with an optional field left `undefined` (e.g. `{ matchedExpected: undefined }`), that `undefined` ended up inside a tool-result `output.value`, which the AI SDK's prompt validation rejects (its JSON-value schema disallows `undefined`), throwing `Invalid prompt: messages must be a ModelMessage[]`. This surfaced as a red error and flipped the result to `{ success: false }` even though every action had already completed (STG-2335).

Root cause fix: deep-strip `undefined` values from the run history before re-submitting it to the forced "done" finalization call, keeping the messages valid without dropping any real content. Class instances (URL, typed arrays for binary data, etc.) are passed through untouched.
44 changes: 42 additions & 2 deletions packages/core/lib/v3/agent/utils/handleDoneToolCall.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,46 @@ interface DoneResult {
output?: Record<string, unknown>;
}

/**
* Deep-remove `undefined` values from the run history before it is re-submitted
* to the forced "done" call.
*
* The AI SDK validates re-submitted messages against its `ModelMessage` schema,
* whose JSON-value schema rejects `undefined` (only null/string/number/boolean/
* object/array are allowed). A custom tool that returns an object with an
* optional field left `undefined` (e.g. `{ matchedExpected: undefined }`) lands
* that `undefined` inside a tool-result `output.value`, and reasoning models can
* leave `undefined` in part fields too. Either makes the forced "done" call
* throw `Invalid prompt: messages must be a ModelMessage[]` (STG-2335) — a red
* error that fires after the run has already completed. Stripping `undefined`
* keeps the history valid without dropping any real content.
*
* Only plain objects and arrays are traversed; class instances (URL, typed
* arrays for binary image data, Date, …) are passed through untouched.
*/
function stripUndefinedDeep<T>(value: T): T {
if (Array.isArray(value)) {
return value.map((v) => stripUndefinedDeep(v)) as unknown as T;
}
if (value !== null && typeof value === "object") {
const proto = Object.getPrototypeOf(value);
if (proto === Object.prototype || proto === null) {
const out: Record<string, unknown> = {};
for (const [k, v] of Object.entries(value as Record<string, unknown>)) {
if (v !== undefined) out[k] = stripUndefinedDeep(v);
}
return out as T;
}
}
return value;
}

export function sanitizeMessagesForResubmission(
messages: ModelMessage[],
): ModelMessage[] {
return stripUndefinedDeep(messages);
}

function buildBaseDoneSchema(factory: typeof z) {
return factory.object({
reasoning: factory
Expand Down Expand Up @@ -114,7 +154,7 @@ Call the "done" tool with:
const result = await generateText({
model,
system: systemPrompt,
messages: [...inputMessages, userPrompt],
messages: [...sanitizeMessagesForResubmission(inputMessages), userPrompt],
tools: { done: doneTool } as ToolSet,
toolChoice: rejectsForcedToolUse(modelId)
? "auto"
Expand All @@ -126,7 +166,7 @@ Call the "done" tool with:
},
});

const doneToolCall = result.toolCalls.find((tc) => tc.toolName === "done");
const doneToolCall = result.toolCalls?.find((tc) => tc.toolName === "done");
const outputMessages: ModelMessage[] = [
userPrompt,
...(result.response?.messages || []),
Expand Down
34 changes: 27 additions & 7 deletions packages/core/lib/v3/handlers/v3AgentHandler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -831,13 +831,33 @@ export class V3AgentHandler {
): Promise<{ messages: ModelMessage[]; output?: Record<string, unknown> }> {
if (state.completed) return { messages };

const doneResult = await handleDoneToolCall({
model,
inputMessages: messages,
instruction,
outputSchema,
logger,
});
let doneResult: Awaited<ReturnType<typeof handleDoneToolCall>>;
try {
doneResult = await handleDoneToolCall({
model,
inputMessages: messages,
instruction,
outputSchema,
logger,
});
} catch (error) {
// The forced "done" call only summarizes the run, so its failure must not
// fail a run whose work already completed (e.g. a provider rejecting the
// re-submitted history). Warn and synthesize a completion. We log only the
// message, not the cause — the cause embeds the full history (base64
// images included) and would bloat the log.
logger?.({
category: "agent",
level: 1,
message: `Agent "done" finalization call failed; using run summary instead: ${getErrorMessage(error)}`,
});
state.completed = true;
state.finalMessage =
state.finalMessage ||
state.collectedReasoning.join(" ").trim() ||
"Task execution completed";
return { messages };
}

state.completed = doneResult.taskComplete;
state.finalMessage = doneResult.reasoning;
Expand Down
118 changes: 118 additions & 0 deletions packages/core/tests/unit/agent-finalization-resilience.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
import { describe, expect, it } from "vitest";
import { generateText, type ModelMessage } from "ai";
import type { LanguageModelV2 } from "@ai-sdk/provider";
import { sanitizeMessagesForResubmission } from "../../lib/v3/agent/utils/handleDoneToolCall.js";

// A minimal mock model. generateText runs the AI SDK's prompt validation
// (standardizePrompt) before ever reaching the model, so this is enough to
// reproduce STG-2335: the forced "done" finalization re-submits the run
// history, and a tool-result whose output value contains an `undefined` field
// trips that validation with "Invalid prompt: messages must be a
// ModelMessage[]" — the AI SDK's JSON-value schema rejects `undefined`.
const mockModel = {
specificationVersion: "v2",
provider: "mock",
modelId: "mock",
supportedUrls: {},
async doGenerate() {
return {
finishReason: "stop" as const,
usage: { inputTokens: 1, outputTokens: 1, totalTokens: 2 },
content: [{ type: "text" as const, text: "ok" }],
warnings: [] as [],
};
},
} as unknown as LanguageModelV2;

// Mirrors PermitFlow's custom tool: an optional field (`matchedExpected`) is
// left `undefined` in the tool result, so the re-submitted tool-result carries
// an `undefined` inside output.value. Also includes a valid reasoning part
// (text: "") to confirm sanitization leaves real content intact.
const malformedHistory = [
{ role: "user", content: "do the task" },
{
role: "assistant",
content: [
{
type: "reasoning",
text: "",
providerOptions: { openai: { itemId: "rs_1" } },
},
{
type: "tool-call",
toolCallId: "c1",
toolName: "captureField",
input: {},
},
],
},
{
role: "tool",
content: [
{
type: "tool-result",
toolCallId: "c1",
toolName: "captureField",
output: {
type: "json",
value: { success: true, value: "permit", matchedExpected: undefined },
},
},
],
},
] as unknown as ModelMessage[];

describe("v3 agent finalization: tool-result re-submission (STG-2335)", () => {
it("reproduces the InvalidPromptError when an undefined tool-result field is re-submitted", async () => {
await expect(
generateText({ model: mockModel, messages: malformedHistory }),
).rejects.toThrow(/must be a ModelMessage\[\]/);
});

it("re-submission succeeds once undefined values are stripped", async () => {
const result = await generateText({
model: mockModel,
messages: sanitizeMessagesForResubmission(malformedHistory),
});
expect(result.text).toBe("ok");
});

it("drops undefined fields but preserves all real content", () => {
const cleaned = sanitizeMessagesForResubmission(malformedHistory);

expect(cleaned).toHaveLength(3);
expect(cleaned[0]).toEqual({ role: "user", content: "do the task" });

// reasoning + tool-call survive on the assistant message.
const assistant = cleaned[1].content as Array<{ type: string }>;
expect(assistant.map((p) => p.type)).toEqual(["reasoning", "tool-call"]);

// tool-result value keeps real fields, drops the undefined one.
const toolResult = (
cleaned[2].content as Array<{
output: { value: Record<string, unknown> };
}>
)[0];
expect(toolResult.output.value).toEqual({ success: true, value: "permit" });
expect("matchedExpected" in toolResult.output.value).toBe(false);
});

it("leaves class instances (URL, typed arrays) untouched", () => {
const url = new URL("https://example.com");
const bytes = new Uint8Array([1, 2, 3]);
const messages = [
{
role: "user",
content: [
{ type: "file", data: url, mediaType: "text/plain" },
{ type: "file", data: bytes, mediaType: "application/octet-stream" },
],
},
] as unknown as ModelMessage[];

const cleaned = sanitizeMessagesForResubmission(messages);
const parts = cleaned[0].content as Array<{ data: unknown }>;
expect(parts[0].data).toBeInstanceOf(URL);
expect(parts[1].data).toBeInstanceOf(Uint8Array);
});
});
58 changes: 58 additions & 0 deletions packages/core/tests/unit/sanitize-resubmission-messages.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
import { describe, expect, it } from "vitest";
import type { ModelMessage } from "ai";
import { sanitizeMessagesForResubmission } from "../../lib/v3/agent/utils/handleDoneToolCall.js";

describe("sanitizeMessagesForResubmission", () => {
it("strips nested undefined from providerOptions (the gpt-5.x failure)", () => {
const messages = [
{
role: "assistant",
content: [
{
type: "reasoning",
text: "",
// jsonValueSchema rejects undefined, so this is what breaks
// standardizePrompt on re-submission.
providerOptions: {
openai: { itemId: "rs_1", reasoningEncryptedContent: undefined },
},
},
],
},
] as unknown as ModelMessage[];

const [msg] = sanitizeMessagesForResubmission(messages);
const part = (
msg.content as unknown as { providerOptions: { openai: object } }[]
)[0];

expect(part.providerOptions.openai).toEqual({ itemId: "rs_1" });
expect("reasoningEncryptedContent" in part.providerOptions.openai).toBe(
false,
);
});

it("preserves null, primitives, and string content unchanged", () => {
const messages = [
{ role: "system", content: "you are an agent" },
{ role: "user", content: "hi" },
] as unknown as ModelMessage[];

expect(sanitizeMessagesForResubmission(messages)).toEqual(messages);
});

it("does not mutate the input messages", () => {
const messages = [
{
role: "assistant",
content: [{ type: "text", text: "x", providerOptions: undefined }],
},
] as unknown as ModelMessage[];

sanitizeMessagesForResubmission(messages);
const original = (
messages[0].content as { providerOptions?: unknown }[]
)[0];
expect("providerOptions" in original).toBe(true);
});
});
Loading