fix(vertexai): preserve non-ASCII in tool-call arguments JSON#1823
Open
Humphrey (HumphreySun98) wants to merge 1 commit into
Open
Conversation
Two sites in `langchain-google-vertexai` serialized tool-call arguments with bare `json.dumps(...)`, which defaults to `ensure_ascii=True` and escapes every non-ASCII character to `\uXXXX`. CJK text, emoji, and accented characters in tool-call args ended up unreadable in `AIMessage.additional_kwargs["function_call"]["arguments"]` (the `_parse_response_candidate` path used by `ChatVertexAI` / Gemini on Vertex) and in the OpenAI-style payload built for Llama Model Garden. This mirrors the langchain-google-genai fix in PR langchain-ai#1804 and the existing convention in `langchain-openai`'s chat model and `langchain-core`'s `messages/utils.py:1810`. - `chat_models.py:818`: tool-call args round-tripped to dict via `proto.Message.to_dict(...)["args"]`, then serialized for `additional_kwargs`. Pass `ensure_ascii=False`. - `model_garden_maas/llama.py:296`: tool-call args serialized into the OpenAI-style `tool_calls[*].function.arguments` field. Pass `ensure_ascii=False`. - `tests/unit_tests/test_chat_models.py`: add `test_parse_response_candidate_preserves_non_ascii_in_function_call_arguments` asserting on the raw string. The existing parametrized `test_parse_response_candidate` round-trips through `json.loads` and is blind to the encoding difference.
Author
|
Heads-up for reviewers: the This is the same Cloud Build live-API failure that's currently hitting my parallel vertexai PR #1820 (a one-line change to This PR only touches Disclaimer: this comment was prepared with the assistance of an AI agent (Claude Code). |
Humphrey (HumphreySun98)
added a commit
to HumphreySun98/langchain-google
that referenced
this pull request
Jun 8, 2026
…ation `BigQueryCallbackHandler` (and the langgraph/async variants) build the content for BigQuery JSON columns with bare `json.dumps(...)`. Python's default `ensure_ascii=True` escapes every non-ASCII character to `\uXXXX`, so CJK / emoji / accented text from chain inputs, outputs, documents, tool calls, agent actions, and langgraph attributes land in storage as escape sequences and are unreadable when inspecting the BigQuery row directly. Pass `ensure_ascii=False` at every `json.dumps` site in `callbacks/bigquery_callback.py` and add unit-test coverage on `_prepare_arrow_batch` asserting CJK and emoji round-trip into the resulting `pa.RecordBatch`. The convention matches what `langchain-openai`, `langchain-core` (`messages/utils.py:1810`), and our just-shipped genai/vertexai `_parse_response_candidate` fixes (langchain-ai#1804, langchain-ai#1823) already use.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Two sites in `langchain-google-vertexai` serialize tool-call arguments with bare `json.dumps(...)`. Python's `json` defaults to `ensure_ascii=True`, so every non-ASCII character is escaped to `\uXXXX`. CJK text, emoji, and accented characters in tool-call args end up unreadable in `AIMessage.additional_kwargs["function_call"]["arguments"]` (the `_parse_response_candidate` path used by `ChatVertexAI` / Gemini on Vertex) and in the OpenAI-style payload built for Llama Model Garden.
This mirrors the langchain-google-genai fix in #1804 (which addresses the same pattern on the genai side) and the convention already established in `langchain-openai`'s chat model and `langchain-core`'s `messages/utils.py:1810`.
Relevant issues
None filed for the vertexai sites specifically; this is the vertexai analog of the langchain-google-genai issue #1789 / PR #1804.
Type
🐛 Bug Fix
Changes
Testing
```
$ uv run pytest libs/vertexai/tests/unit_tests/test_chat_models.py -k parse_response_candidate -q
..............
14 passed, 96 deselected in 6.48s
```
Reverting only the one-keyword `chat_models.py` change while keeping the new test makes the test fail (`assert "你好" in '{"text": "\\u4f60\\u597d", ...}'` → AssertionError), confirming the regression test pins the buggy behavior.
`ruff check` and `ruff format --check` pass on all three files.
Note
The fix is intentionally minimal — a single keyword in each of two `json.dumps` calls. No behavior changes for callers reading `tool_calls[i]["args"]` (which already round-trips through `parse_tool_calls` and so has always been a clean dict). The change only affects the legacy `additional_kwargs["function_call"]["arguments"]` string and the OpenAI-style `tool_calls[*].function.arguments` field.
Disclaimer: this PR was prepared with the assistance of an AI agent (Claude Code). All code and test changes were reviewed by the author before submission.