Skip to content

Question: are JsonSchema constraints expected to hold under non-alternating chat turn sequences (e.g. two ctx.user() before ctx.cue())? #358

Description

@shsym

TL;DR. First — the Sampler::Argmax + .constrain_with(JsonSchema) pattern is wonderful; it lets us guarantee well-formed structured output by construction, which is the right primitive. While building on it, we noticed that on Qwen3-8B the constraint appears to take effect under the canonical system → user → cue shape but not under system → user → user → cue (two consecutive user calls before cue). We think the non-alternating pattern is the root cause and is technically non-idiomatic — every pie example uses strict alternation — but the API accepts it silently and we wanted to ask whether constraint enforcement should be robust to it (or whether calling out the expected turn pattern in the docs would be enough).

Environment. pie 0.3.0 · Qwen/Qwen3-8B · portable + Metal · macOS.

Canonical pattern (works as expected)

ctx.system(SYSTEM_PROMPT);
ctx.user(USER_MESSAGE);
ctx.cue();

let text = ctx.generate(Sampler::Argmax)
    .max_tokens(N)
    .constrain_with(JsonSchema(SCHEMA))?
    .collect_text()
    .await?;
// → valid JSON, schema-conforming.

This matches every example we found in pie's inferlets/ directory (json-schema-validation, constrained-decoding, template-generation, demo-grammar, agent-react, agent-codeact, recursion-of-thought).

What we tried (and why)

PR-Agent forwards an OpenAI-shaped messages array. The initial inferlet walked the array with the appropriate role-filler, then appended a small trailing user-turn (/no_think-style directive) before ctx.cue():

for msg in &input.messages {        // ends with a user-role message
    match msg.role.as_str() {
        "system"    => { ctx.system(&msg.content); }
        "user"      => { ctx.user(&msg.content); }
        "assistant" => { ctx.assistant(&msg.content); }
        _ => return Err("unsupported role".into()),
    }
}
ctx.user("/no_think\n\nReply with valid JSON only matching the schema above.");
ctx.cue();
// same generate(...).constrain_with(...).collect_text() as above
// → runaway `<think>` block, schema not honored.

That produces two consecutive <|im_start|>user...<|im_end|> blocks with no assistant (or cue → generate) between them. We realized afterward this isn't a pattern any pie example uses — all canonical flows maintain strict user/assistant alternation.

Variant isolation

To isolate whether the trigger was PR-Agent's content, prompt size, or the turn shape, we ran six variants on the same engine/model:

Variant Setup Output
1 neutral sys + 1 neutral user + cue JSON ✓
2 neutral sys + 2 neutral users + cue JSON ✓
3 neutral sys + 1 long (16k tok) user + cue JSON ✓
4 neutral sys + 1 long user + 1 short user + cue JSON ✓
5 PR-Agent sys + PR-Agent user + cue JSON ✓
6 PR-Agent sys + PR-Agent user + 1 short user + cue NL ✗

Variant 5 (canonical alternation, real PR-Agent content) works. Variant 6 (same content + one trailing non-alternating user turn) breaks. Removing that trailing user-turn restores constraint enforcement — that's the workaround we shipped as pr-review@0.1.2.

The question

Grammar constraints are documented as enforcing structural validity at every generated token, which (naively) reads as "robust to any prompt shape." On the other hand, non-alternating turn sequences are non-idiomatic and out-of-distribution for chat-tuned models, so unexpected behavior on them is at least defensible.

Possible resolutions, listed from cheapest to most invasive:

  1. Document the canonical turn pattern. Note in the Context filler docs (or the JsonSchema constraint docs) that turn-fillers are designed for alternating sequences, and that constraint enforcement is most reliable on patterns matching the example inferlets.
  2. Soft warn at the SDK boundary. Optionally log a debug message when two consecutive same-role fillers are appended.
  3. Investigate whether the constraint mechanism could be made prompt-shape-robust. Lower priority if (1) is in place.

We're happy with (1) — the workaround is trivial once you know to look for it, and downstream we've documented the gotcha for our users. Filing this in case the observation is useful to the project.

Reproducer

  • Inferlets: pr-review and pr-review-diag* at https://github.com/shsym/pie-pr-review (current pr-review@0.1.2 ships the canonical-alternation workaround).
  • Trigger content: captured PR-Agent payload at tests/e2e/payloads/2026-05-14-pr1-pr-agent.json — feed it through the broken-code path above to reproduce.

The variant table isolates the trigger: schema, prompt size, model, and sampler are constant across variants 1–6 and only variant 6 fails.

Thanks for the great work on grammar-constrained generation — it's been a big quality-of-life win for our use case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions