fix(genai): preserve non-ASCII in tool-call arguments JSON by HumphreySun98 · Pull Request #1804 · langchain-ai/langchain-google

Humphrey (HumphreySun98) · 2026-05-25T19:09:51Z

Description

_parse_response_candidate in libs/genai/langchain_google_genai/chat_models.py serialized additional_kwargs["function_call"]["arguments"] with the default json.dumps(...). Python's json defaults to ensure_ascii=True, so every non-ASCII character is escaped to \uXXXX. CJK text, accented characters, and emoji in tool-call arguments became unreadable when written to JSON columns / log files.

The same arguments stay correct in tool_calls[i]["args"] (a clean dict, because parse_tool_calls round-trips through json.loads), so consumers see different content depending on which field they read. From #1789:

msg.tool_calls[0]["args"]                              # → {'text': '안녕하세요'}        ✅
msg.additional_kwargs["function_call"]["arguments"]    # → '{"text": "\\uc548\\ub155\\ud558\\uc138\\uc694"}'  ❌

langchain-openai already passes ensure_ascii=False at the analogous call site in langchain_openai/chat_models/base.py, and langchain-core follows the same convention across its json.dumps sites that touch message content. This change makes langchain-google-genai match — one keyword argument.

Relevant issues

Fixes #1789

Type

🐛 Bug Fix

Changes

libs/genai/langchain_google_genai/chat_models.py: pass ensure_ascii=False to the json.dumps that produces function_call["arguments"].
libs/genai/tests/unit_tests/test_chat_models.py: add test_parse_response_candidate_preserves_non_ascii_in_function_call_arguments asserting on the raw string (not the json.loads round-trip — the existing parametrized tests round-trip and so were blind to the encoding difference).

Testing

$ uv run --group test pytest tests/unit_tests/test_chat_models.py -k parse_response_candidate -v
...
test_parse_response_candidate_preserves_non_ascii_in_function_call_arguments PASSED
============= 17 passed, 203 deselected in 5.18s =============

Reverting only the one-line chat_models.py change while keeping the new test makes that test fail with the exact '\\uc548\\ub155...' escape sequence from the issue, confirming the test pins the buggy behavior.

ruff check and ruff format --check pass on both files.

Note

Per CLAUDE.md PR guidelines: this fix was prepared with the assistance of an AI agent (Claude Code). All code and test changes were reviewed by the author before submission.

`_parse_response_candidate` in `chat_models.py` serialized `additional_kwargs["function_call"]["arguments"]` with the default `json.dumps(...)`, which escapes every non-ASCII character to `\uXXXX`. CJK text, accented characters, and emoji that the model returns in tool call arguments became unreadable when persisted to JSON columns / log files. The same arguments stay correct in `tool_calls[i]["args"]` (a clean dict, because `parse_tool_calls` round-trips through `json.loads`), so consumers see different content depending on which field they read. `langchain-openai` already passes `ensure_ascii=False` at the analogous site (`langchain_openai/chat_models/base.py`), and `langchain-core` follows the same convention across its `json.dumps` call sites that touch message content. This change makes `langchain-google-genai` match. The existing parametrized `test_parse_response_candidate` cases round arguments through `json.loads` for equality, so they were blind to the encoding difference. The new regression test asserts on the raw string. Fixes langchain-ai#1789

Humphrey (HumphreySun98) · 2026-06-01T16:39:30Z

Hi Mason Daugherty (@mdrxy) — friendly nudge when you get a chance. This is a one-keyword fix in genai's _parse_response_candidate (ensure_ascii=False in json.dumps), matching the convention already used by langchain-openai's chat model and langchain-core's messages/utils.py:1810. Regression test fails before and passes after; CI green across all 5 Python versions. Happy to adjust placement / naming if you'd prefer.

…ation `BigQueryCallbackHandler` (and the langgraph/async variants) build the content for BigQuery JSON columns with bare `json.dumps(...)`. Python's default `ensure_ascii=True` escapes every non-ASCII character to `\uXXXX`, so CJK / emoji / accented text from chain inputs, outputs, documents, tool calls, agent actions, and langgraph attributes land in storage as escape sequences and are unreadable when inspecting the BigQuery row directly. Pass `ensure_ascii=False` at every `json.dumps` site in `callbacks/bigquery_callback.py` and add unit-test coverage on `_prepare_arrow_batch` asserting CJK and emoji round-trip into the resulting `pa.RecordBatch`. The convention matches what `langchain-openai`, `langchain-core` (`messages/utils.py:1810`), and our just-shipped genai/vertexai `_parse_response_candidate` fixes (langchain-ai#1804, langchain-ai#1823) already use.

This was referenced Jun 2, 2026

fix(vertexai): preserve non-ASCII in tool-call arguments JSON #1823

Open

fix(community): preserve non-ASCII in BigQuery callback JSON serialization #1827

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(genai): preserve non-ASCII in tool-call arguments JSON#1804

fix(genai): preserve non-ASCII in tool-call arguments JSON#1804
Humphrey (HumphreySun98) wants to merge 1 commit into
langchain-ai:mainfrom
HumphreySun98:fix/genai-tool-call-args-unicode-escape

Humphrey (HumphreySun98) commented May 25, 2026

Uh oh!

Humphrey (HumphreySun98) commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Humphrey (HumphreySun98) commented May 25, 2026

Description

Relevant issues

Type

Changes

Testing

Note

Uh oh!

Humphrey (HumphreySun98) commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant