Skip to content

Gemini 3.5 Flash slow or malformed when generating large function-call arguments #1619

@uriva

Description

@uriva

Summary

gemini-3.5-flash can take 50-120s, return MAX_TOKENS, or return MALFORMED_FUNCTION_CALL when generating a large string inside a function-call argument.

This reproduces with a direct @google/genai call, without any agent framework or application wrapper.

Minimal repro gist:
https://gist.github.com/uriva/6bc9e20d1915c8d655ad3b3558796977

Environment

  • SDK: @google/genai@2.4.0
  • Runtime: Deno
  • Model: gemini-3.5-flash
  • API: Gemini API via API key
  • Call: ai.models.generateContent(request)
  • Streaming: not required to reproduce

Repro shape

The repro sends one user message and one declared tool:

  • Tool: run_command
  • Args schema:
    • command: string
    • params: any
  • Prompt asks the model to call run_command with command: code_execution/write_file
  • params.content should contain a complete single-file HTML landing page

Request size is very small:

  • About 1.5KB serialized request
  • About 268 prompt tokens in one observed run
  • One user content item
  • One function declaration
  • No previous turns
  • No thought signatures in input

Config:

thinkingConfig: {
  includeThoughts: false,
  thinkingLevel: ThinkingLevel.LOW,
},
maxOutputTokens: 16000,
toolConfig: { functionCallingConfig: {} },

Observed behavior

Across repeated runs of the same direct SDK request, outcomes vary:

  1. Slow successful function call

    • ~50-60s latency
    • finishReason: STOP
    • returned run_command function call
    • generated ~48KB-56KB JSON function-call args
  2. Token exhaustion

    • ~120s latency
    • finishReason: MAX_TOKENS
    • returned partial/large run_command function call args around ~56KB-60KB
  3. Malformed function call

    • ~122s latency
    • finishReason: MALFORMED_FUNCTION_CALL
    • no content parts exposed by the SDK response

Example observed malformed run:

{
  "elapsedMs": 122344,
  "finishReason": "MALFORMED_FUNCTION_CALL",
  "usage": {
    "promptTokenCount": 268,
    "totalTokenCount": 268,
    "serviceTier": "standard"
  },
  "parts": []
}

Example observed slow successful run:

{
  "elapsedMs": 50557,
  "finishReason": "STOP",
  "usage": {
    "promptTokenCount": 234,
    "candidatesTokenCount": 12581,
    "totalTokenCount": 13288
  },
  "parts": [
    {
      "functionCall": "run_command",
      "argBytes": 48015
    }
  ]
}

Expected behavior

Large string arguments in function calls should either:

  • complete reliably within reasonable latency, or
  • fail quickly with an actionable error, or
  • expose enough partial output/diagnostics to understand why the function call became malformed.

A large file write via a tool argument is a valid agentic workflow. The issue seems specific to generating large structured function-call JSON arguments, not to finding the tool.

Notes

  • This is not caused by our app's agent wrapper; the gist calls GoogleGenAI directly.
  • This is not caused by prior thought signatures; the minimal repro has no prior model turns.
  • This is not about many tools; the minimal repro has one declared function.
  • The model often does find the intended tool and starts generating params.content correctly, but generation is slow/flaky and sometimes ends as MAX_TOKENS or MALFORMED_FUNCTION_CALL.

Metadata

Metadata

Assignees

Labels

api:gemini-apipriority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions