fix(vertexai): preserve non-ASCII in tool-call arguments JSON by HumphreySun98 · Pull Request #1823 · langchain-ai/langchain-google

Humphrey (HumphreySun98) · 2026-06-02T21:35:19Z

Description

Two sites in `langchain-google-vertexai` serialize tool-call arguments with bare `json.dumps(...)`. Python's `json` defaults to `ensure_ascii=True`, so every non-ASCII character is escaped to `\uXXXX`. CJK text, emoji, and accented characters in tool-call args end up unreadable in `AIMessage.additional_kwargs["function_call"]["arguments"]` (the `_parse_response_candidate` path used by `ChatVertexAI` / Gemini on Vertex) and in the OpenAI-style payload built for Llama Model Garden.

This mirrors the langchain-google-genai fix in #1804 (which addresses the same pattern on the genai side) and the convention already established in `langchain-openai`'s chat model and `langchain-core`'s `messages/utils.py:1810`.

Relevant issues

None filed for the vertexai sites specifically; this is the vertexai analog of the langchain-google-genai issue #1789 / PR #1804.

Type

🐛 Bug Fix

Changes

`libs/vertexai/langchain_google_vertexai/chat_models.py`: pass `ensure_ascii=False` to the `json.dumps` that produces `function_call["arguments"]` in `_parse_response_candidate`.
`libs/vertexai/langchain_google_vertexai/model_garden_maas/llama.py`: pass `ensure_ascii=False` to the `json.dumps` that serializes tool-call args into the OpenAI-style payload.
`libs/vertexai/tests/unit_tests/test_chat_models.py`: add `test_parse_response_candidate_preserves_non_ascii_in_function_call_arguments` asserting on the raw string. The existing parametrized `test_parse_response_candidate` round-trips arguments through `json.loads` for equality and so was blind to the encoding difference.

Testing

```
$ uv run pytest libs/vertexai/tests/unit_tests/test_chat_models.py -k parse_response_candidate -q
..............
14 passed, 96 deselected in 6.48s
```

Reverting only the one-keyword `chat_models.py` change while keeping the new test makes the test fail (`assert "你好" in '{"text": "\\u4f60\\u597d", ...}'` → AssertionError), confirming the regression test pins the buggy behavior.

`ruff check` and `ruff format --check` pass on all three files.

Note

The fix is intentionally minimal — a single keyword in each of two `json.dumps` calls. No behavior changes for callers reading `tool_calls[i]["args"]` (which already round-trips through `parse_tool_calls` and so has always been a clean dict). The change only affects the legacy `additional_kwargs["function_call"]["arguments"]` string and the OpenAI-style `tool_calls[*].function.arguments` field.

Disclaimer: this PR was prepared with the assistance of an AI agent (Claude Code). All code and test changes were reviewed by the author before submission.

Two sites in `langchain-google-vertexai` serialized tool-call arguments with bare `json.dumps(...)`, which defaults to `ensure_ascii=True` and escapes every non-ASCII character to `\uXXXX`. CJK text, emoji, and accented characters in tool-call args ended up unreadable in `AIMessage.additional_kwargs["function_call"]["arguments"]` (the `_parse_response_candidate` path used by `ChatVertexAI` / Gemini on Vertex) and in the OpenAI-style payload built for Llama Model Garden. This mirrors the langchain-google-genai fix in PR langchain-ai#1804 and the existing convention in `langchain-openai`'s chat model and `langchain-core`'s `messages/utils.py:1810`. - `chat_models.py:818`: tool-call args round-tripped to dict via `proto.Message.to_dict(...)["args"]`, then serialized for `additional_kwargs`. Pass `ensure_ascii=False`. - `model_garden_maas/llama.py:296`: tool-call args serialized into the OpenAI-style `tool_calls[*].function.arguments` field. Pass `ensure_ascii=False`. - `tests/unit_tests/test_chat_models.py`: add `test_parse_response_candidate_preserves_non_ascii_in_function_call_arguments` asserting on the raw string. The existing parametrized `test_parse_response_candidate` round-trips through `json.loads` and is blind to the encoding difference.

Humphrey (HumphreySun98) · 2026-06-03T15:31:49Z

Heads-up for reviewers: the langchain-google-vertexai-us (llm-integration-tests) red mark is unrelated to this PR.

This is the same Cloud Build live-API failure that's currently hitting my parallel vertexai PR #1820 (a one-line change to _thinking_in_params that purely swaps a string comparison — no path through which it could regress integration tests). It also matches the precedent of merged PR #1730, which shipped with the same red mark.

This PR only touches json.dumps(...) encoding in tool-call argument serialization paths; all GitHub Actions (build, lint, test 3.10–3.14, compile integration tests) are green.

Disclaimer: this comment was prepared with the assistance of an AI agent (Claude Code).

…ation `BigQueryCallbackHandler` (and the langgraph/async variants) build the content for BigQuery JSON columns with bare `json.dumps(...)`. Python's default `ensure_ascii=True` escapes every non-ASCII character to `\uXXXX`, so CJK / emoji / accented text from chain inputs, outputs, documents, tool calls, agent actions, and langgraph attributes land in storage as escape sequences and are unreadable when inspecting the BigQuery row directly. Pass `ensure_ascii=False` at every `json.dumps` site in `callbacks/bigquery_callback.py` and add unit-test coverage on `_prepare_arrow_batch` asserting CJK and emoji round-trip into the resulting `pa.RecordBatch`. The convention matches what `langchain-openai`, `langchain-core` (`messages/utils.py:1810`), and our just-shipped genai/vertexai `_parse_response_candidate` fixes (langchain-ai#1804, langchain-ai#1823) already use.

Humphrey (HumphreySun98) mentioned this pull request Jun 5, 2026

fix(community): preserve non-ASCII in BigQuery callback JSON serialization #1827

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(vertexai): preserve non-ASCII in tool-call arguments JSON#1823

fix(vertexai): preserve non-ASCII in tool-call arguments JSON#1823
Humphrey (HumphreySun98) wants to merge 1 commit into
langchain-ai:mainfrom
HumphreySun98:fix/vertexai-tool-call-args-unicode-escape

Humphrey (HumphreySun98) commented Jun 2, 2026

Uh oh!

Humphrey (HumphreySun98) commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Humphrey (HumphreySun98) commented Jun 2, 2026

Description

Relevant issues

Type

Changes

Testing

Note

Uh oh!

Humphrey (HumphreySun98) commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant