Skip to content

fix(vertexai): preserve non-ASCII in tool-call arguments JSON#1823

Open
Humphrey (HumphreySun98) wants to merge 1 commit into
langchain-ai:mainfrom
HumphreySun98:fix/vertexai-tool-call-args-unicode-escape
Open

fix(vertexai): preserve non-ASCII in tool-call arguments JSON#1823
Humphrey (HumphreySun98) wants to merge 1 commit into
langchain-ai:mainfrom
HumphreySun98:fix/vertexai-tool-call-args-unicode-escape

Conversation

@HumphreySun98

Copy link
Copy Markdown

Description

Two sites in `langchain-google-vertexai` serialize tool-call arguments with bare `json.dumps(...)`. Python's `json` defaults to `ensure_ascii=True`, so every non-ASCII character is escaped to `\uXXXX`. CJK text, emoji, and accented characters in tool-call args end up unreadable in `AIMessage.additional_kwargs["function_call"]["arguments"]` (the `_parse_response_candidate` path used by `ChatVertexAI` / Gemini on Vertex) and in the OpenAI-style payload built for Llama Model Garden.

This mirrors the langchain-google-genai fix in #1804 (which addresses the same pattern on the genai side) and the convention already established in `langchain-openai`'s chat model and `langchain-core`'s `messages/utils.py:1810`.

Relevant issues

None filed for the vertexai sites specifically; this is the vertexai analog of the langchain-google-genai issue #1789 / PR #1804.

Type

🐛 Bug Fix

Changes

  • `libs/vertexai/langchain_google_vertexai/chat_models.py`: pass `ensure_ascii=False` to the `json.dumps` that produces `function_call["arguments"]` in `_parse_response_candidate`.
  • `libs/vertexai/langchain_google_vertexai/model_garden_maas/llama.py`: pass `ensure_ascii=False` to the `json.dumps` that serializes tool-call args into the OpenAI-style payload.
  • `libs/vertexai/tests/unit_tests/test_chat_models.py`: add `test_parse_response_candidate_preserves_non_ascii_in_function_call_arguments` asserting on the raw string. The existing parametrized `test_parse_response_candidate` round-trips arguments through `json.loads` for equality and so was blind to the encoding difference.

Testing

```
$ uv run pytest libs/vertexai/tests/unit_tests/test_chat_models.py -k parse_response_candidate -q
..............
14 passed, 96 deselected in 6.48s
```

Reverting only the one-keyword `chat_models.py` change while keeping the new test makes the test fail (`assert "你好" in '{"text": "\\u4f60\\u597d", ...}'` → AssertionError), confirming the regression test pins the buggy behavior.

`ruff check` and `ruff format --check` pass on all three files.

Note

The fix is intentionally minimal — a single keyword in each of two `json.dumps` calls. No behavior changes for callers reading `tool_calls[i]["args"]` (which already round-trips through `parse_tool_calls` and so has always been a clean dict). The change only affects the legacy `additional_kwargs["function_call"]["arguments"]` string and the OpenAI-style `tool_calls[*].function.arguments` field.

Disclaimer: this PR was prepared with the assistance of an AI agent (Claude Code). All code and test changes were reviewed by the author before submission.

Two sites in `langchain-google-vertexai` serialized tool-call arguments
with bare `json.dumps(...)`, which defaults to `ensure_ascii=True` and
escapes every non-ASCII character to `\uXXXX`. CJK text, emoji, and
accented characters in tool-call args ended up unreadable in
`AIMessage.additional_kwargs["function_call"]["arguments"]` (the
`_parse_response_candidate` path used by `ChatVertexAI` / Gemini on
Vertex) and in the OpenAI-style payload built for Llama Model Garden.

This mirrors the langchain-google-genai fix in PR langchain-ai#1804 and the existing
convention in `langchain-openai`'s chat model and
`langchain-core`'s `messages/utils.py:1810`.

- `chat_models.py:818`: tool-call args round-tripped to dict via
  `proto.Message.to_dict(...)["args"]`, then serialized for
  `additional_kwargs`. Pass `ensure_ascii=False`.
- `model_garden_maas/llama.py:296`: tool-call args serialized into the
  OpenAI-style `tool_calls[*].function.arguments` field. Pass
  `ensure_ascii=False`.
- `tests/unit_tests/test_chat_models.py`: add
  `test_parse_response_candidate_preserves_non_ascii_in_function_call_arguments`
  asserting on the raw string. The existing parametrized
  `test_parse_response_candidate` round-trips through `json.loads` and
  is blind to the encoding difference.
@HumphreySun98

Copy link
Copy Markdown
Author

Heads-up for reviewers: the langchain-google-vertexai-us (llm-integration-tests) red mark is unrelated to this PR.

This is the same Cloud Build live-API failure that's currently hitting my parallel vertexai PR #1820 (a one-line change to _thinking_in_params that purely swaps a string comparison — no path through which it could regress integration tests). It also matches the precedent of merged PR #1730, which shipped with the same red mark.

This PR only touches json.dumps(...) encoding in tool-call argument serialization paths; all GitHub Actions (build, lint, test 3.10–3.14, compile integration tests) are green.

Disclaimer: this comment was prepared with the assistance of an AI agent (Claude Code).

Humphrey (HumphreySun98) added a commit to HumphreySun98/langchain-google that referenced this pull request Jun 8, 2026
…ation

`BigQueryCallbackHandler` (and the langgraph/async variants) build the
content for BigQuery JSON columns with bare `json.dumps(...)`. Python's
default `ensure_ascii=True` escapes every non-ASCII character to
`\uXXXX`, so CJK / emoji / accented text from chain inputs, outputs,
documents, tool calls, agent actions, and langgraph attributes land in
storage as escape sequences and are unreadable when inspecting the
BigQuery row directly.

Pass `ensure_ascii=False` at every `json.dumps` site in
`callbacks/bigquery_callback.py` and add unit-test coverage on
`_prepare_arrow_batch` asserting CJK and emoji round-trip into the
resulting `pa.RecordBatch`.

The convention matches what `langchain-openai`, `langchain-core`
(`messages/utils.py:1810`), and our just-shipped genai/vertexai
`_parse_response_candidate` fixes (langchain-ai#1804, langchain-ai#1823) already use.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant