fix(llm): stabilize JSON-object generation (force tool_choice + native response_format + robust planner parsing)#78
Merged
Conversation
Users reported unstable JSON-object generation. Two root causes:
1. Structured output never used provider-native guarantees: the synthetic
`emit_<schema>` tool was only *offered* (no `tool_choice`), and Strict/Json
modes collapsed to Tool universally — so the model could emit prose or
malformed args ("parse-and-pray").
2. The planner / pre-analysis JSON path used a naive first-`{`/last-`}` slice
with no fence handling and no repair, hard-erroring on fenced/prosey output.
Fix (Tier 1 + Tier 2):
- LlmClient: additive `native_structured_support()`, `complete_structured()`,
`complete_streaming_structured()` with default impls (non-breaking — existing
clients/mocks keep working).
- structured.rs: capability-aware `resolve_mode` + `StructuredDirective`. Force
`tool_choice` (Tool/Auto), and request native `response_format`
(Strict→json_schema+strict, Json→json_object) on capable providers, falling
back to forced Tool mode otherwise.
- anthropic/openai/zhipu: honor the directive (Anthropic forced tool_choice;
OpenAI tool_choice + response_format; Zhipu delegates to its inner client).
- llm_planner: reuse the robust shared extractor + add one repair retry in
`pre_analyze`.
- generate_object: stop pre-collapsing Strict/Json to Tool (engine resolves it).
Tests:
- Deep adversarial unit tests: capability/directive routing, provider
wire-format (tool_choice/response_format), adversarial JSON extraction,
planner fence/prose/brace-in-string + repair-retry. 1811 lib tests green,
fmt + clippy clean.
- New `#[ignore]` real-LLM integration test (tests/test_structured_json_real_llm.rs).
Validated end-to-end against gpt-4o via the real gateway in .a3s/config.acl:
forced tool_choice 5/5 stable (0 repairs), json_object ok, pre_analyze ok.
Notes:
- Strict json_schema is opt-in only; Auto/Tool never send response_format.
- Follow-up: override complete_streaming_structured on providers so the
streaming structured path also forces the directive (today it uses the
non-forcing default).
Follow-up to the blocking-path fix: providers now override `complete_streaming_structured` so streaming structured generation also forces `tool_choice` / sets native `response_format`, instead of falling back to the non-forcing default. - anthropic/openai: extract `send_streaming` from `complete_streaming`; the trait methods become thin wrappers, and `complete_streaming_structured` applies the directive before executing. The large streaming parsers are unchanged. - zhipu: already delegates to its inner client (no change). - tests: RecordingClient records the streaming directive; new unit test asserts `generate_streaming` forces the tool; new `#[ignore]` real-LLM streaming case. Validated against gpt-4o: streaming forced tool_choice -> 8 partials, valid object (5/5 integration cases pass).
Contributor
Author
|
Addressed the streaming follow-up in |
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Users reported a3s-code's JSON-object generation is unstable. Root-cause audit found two gaps:
emit_<schema>tool but only offered it (notool_choice), andStrict/Jsonmodes collapsed toTooluniversally — so the model could reply with prose or malformed args. Pure parse-and-pray.llm_planner::extract_jsonwas a naive first-{/last-}slice: no markdown-fence handling, no balanced extraction, no repair, hard-erroring on any fenced/prosey/brace-in-string output.Fix (Tier 1 + Tier 2)
LlmClienttrait (non-breaking): additivenative_structured_support(),complete_structured(),complete_streaming_structured()— all with default impls that reproduce current behavior, so every existing client/mock keeps working.structured.rs: capability-awareresolve_mode+StructuredDirective. Forcetool_choicefor Tool/Auto; request nativeresponse_format(Strict→json_schema+strict,Json→json_object) only on providers that support it, falling back to forced Tool mode otherwise (never silent degradation).tool_choice; OpenAI setstool_choice+response_format; Zhipu delegates to its inner OpenAI client. Request-building was extracted from request-execution so the structured path shares the exact HTTP/retry/parse code (complete_streamingleft untouched).llm_planner.rs: the four parse paths now reuse the robust shared extractor, andpre_analyzegets one repair-retry (re-prompt for strict JSON) before falling back.generate_object.rs: stop pre-collapsing Strict/Json to Tool — the engine resolves per capability.Tests
tool_choice/response_format(Anthropic + OpenAI), adversarial extraction (CRLF/uppercase fences, prose+brace-in-string, single-quote rejection, tool-returns-text fallback), planner fence/prose/brace + repair-retry. 1811 lib tests green,cargo fmt+clippyclean.#[ignore]real-LLM integration test (tests/test_structured_json_real_llm.rs) driven by.a3s/config.acl.Real-LLM validation (against
gpt-4ovia the configured gateway)tool_choice×5 (stability)json_objectpre_analyze(planner JSON)json_schema(strict)Notes / follow-ups
Strict(json_schemastrict) is opt-in only; Auto/Tool never sendresponse_format(avoids the OpenAI strict-subset 400 footgun).complete_streaming_structuredon the providers so the streaming structured path also forces the directive — today it uses the non-forcing default (the emit tool is still present, just not forced). Not a correctness bug; a reliability optimization for the streaming path.