fix(genai): preserve non-ASCII in tool-call arguments JSON#1804
Open
Humphrey (HumphreySun98) wants to merge 1 commit into
Open
fix(genai): preserve non-ASCII in tool-call arguments JSON#1804Humphrey (HumphreySun98) wants to merge 1 commit into
Humphrey (HumphreySun98) wants to merge 1 commit into
Conversation
`_parse_response_candidate` in `chat_models.py` serialized `additional_kwargs["function_call"]["arguments"]` with the default `json.dumps(...)`, which escapes every non-ASCII character to `\uXXXX`. CJK text, accented characters, and emoji that the model returns in tool call arguments became unreadable when persisted to JSON columns / log files. The same arguments stay correct in `tool_calls[i]["args"]` (a clean dict, because `parse_tool_calls` round-trips through `json.loads`), so consumers see different content depending on which field they read. `langchain-openai` already passes `ensure_ascii=False` at the analogous site (`langchain_openai/chat_models/base.py`), and `langchain-core` follows the same convention across its `json.dumps` call sites that touch message content. This change makes `langchain-google-genai` match. The existing parametrized `test_parse_response_candidate` cases round arguments through `json.loads` for equality, so they were blind to the encoding difference. The new regression test asserts on the raw string. Fixes langchain-ai#1789
Author
|
Hi Mason Daugherty (@mdrxy) — friendly nudge when you get a chance. This is a one-keyword fix in genai's |
This was referenced Jun 2, 2026
Humphrey (HumphreySun98)
added a commit
to HumphreySun98/langchain-google
that referenced
this pull request
Jun 8, 2026
…ation `BigQueryCallbackHandler` (and the langgraph/async variants) build the content for BigQuery JSON columns with bare `json.dumps(...)`. Python's default `ensure_ascii=True` escapes every non-ASCII character to `\uXXXX`, so CJK / emoji / accented text from chain inputs, outputs, documents, tool calls, agent actions, and langgraph attributes land in storage as escape sequences and are unreadable when inspecting the BigQuery row directly. Pass `ensure_ascii=False` at every `json.dumps` site in `callbacks/bigquery_callback.py` and add unit-test coverage on `_prepare_arrow_batch` asserting CJK and emoji round-trip into the resulting `pa.RecordBatch`. The convention matches what `langchain-openai`, `langchain-core` (`messages/utils.py:1810`), and our just-shipped genai/vertexai `_parse_response_candidate` fixes (langchain-ai#1804, langchain-ai#1823) already use.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
_parse_response_candidateinlibs/genai/langchain_google_genai/chat_models.pyserializedadditional_kwargs["function_call"]["arguments"]with the defaultjson.dumps(...). Python'sjsondefaults toensure_ascii=True, so every non-ASCII character is escaped to\uXXXX. CJK text, accented characters, and emoji in tool-call arguments became unreadable when written to JSON columns / log files.The same arguments stay correct in
tool_calls[i]["args"](a clean dict, becauseparse_tool_callsround-trips throughjson.loads), so consumers see different content depending on which field they read. From #1789:langchain-openaialready passesensure_ascii=Falseat the analogous call site inlangchain_openai/chat_models/base.py, andlangchain-corefollows the same convention across itsjson.dumpssites that touch message content. This change makeslangchain-google-genaimatch — one keyword argument.Relevant issues
Fixes #1789
Type
🐛 Bug Fix
Changes
libs/genai/langchain_google_genai/chat_models.py: passensure_ascii=Falseto thejson.dumpsthat producesfunction_call["arguments"].libs/genai/tests/unit_tests/test_chat_models.py: addtest_parse_response_candidate_preserves_non_ascii_in_function_call_argumentsasserting on the raw string (not the json.loads round-trip — the existing parametrized tests round-trip and so were blind to the encoding difference).Testing
Reverting only the one-line
chat_models.pychange while keeping the new test makes that test fail with the exact'\\uc548\\ub155...'escape sequence from the issue, confirming the test pins the buggy behavior.ruff checkandruff format --checkpass on both files.Note
Per
CLAUDE.mdPR guidelines: this fix was prepared with the assistance of an AI agent (Claude Code). All code and test changes were reviewed by the author before submission.