Question: are JsonSchema constraints expected to hold under non-alternating chat turn sequences (e.g. two ctx.user() before ctx.cue())?

**TL;DR.** First — the `Sampler::Argmax + .constrain_with(JsonSchema)` pattern is wonderful; it lets us guarantee well-formed structured output by construction, which is the right primitive. While building on it, we noticed that on Qwen3-8B the constraint appears to take effect under the canonical `system → user → cue` shape but not under `system → user → user → cue` (two consecutive `user` calls before `cue`). We *think* the non-alternating pattern is the root cause and is technically non-idiomatic — every pie example uses strict alternation — but the API accepts it silently and we wanted to ask whether constraint enforcement should be robust to it (or whether calling out the expected turn pattern in the docs would be enough).

**Environment.** pie 0.3.0 · `Qwen/Qwen3-8B` · portable + Metal · macOS.

### Canonical pattern (works as expected)

```rust
ctx.system(SYSTEM_PROMPT);
ctx.user(USER_MESSAGE);
ctx.cue();

let text = ctx.generate(Sampler::Argmax)
    .max_tokens(N)
    .constrain_with(JsonSchema(SCHEMA))?
    .collect_text()
    .await?;
// → valid JSON, schema-conforming.
```

This matches every example we found in pie's `inferlets/` directory (`json-schema-validation`, `constrained-decoding`, `template-generation`, `demo-grammar`, `agent-react`, `agent-codeact`, `recursion-of-thought`).

### What we tried (and why)

PR-Agent forwards an OpenAI-shaped `messages` array. The initial inferlet walked the array with the appropriate role-filler, then appended a small trailing user-turn (`/no_think`-style directive) before `ctx.cue()`:

```rust
for msg in &input.messages {        // ends with a user-role message
    match msg.role.as_str() {
        "system"    => { ctx.system(&msg.content); }
        "user"      => { ctx.user(&msg.content); }
        "assistant" => { ctx.assistant(&msg.content); }
        _ => return Err("unsupported role".into()),
    }
}
ctx.user("/no_think\n\nReply with valid JSON only matching the schema above.");
ctx.cue();
// same generate(...).constrain_with(...).collect_text() as above
// → runaway `<think>` block, schema not honored.
```

That produces two consecutive `<|im_start|>user...<|im_end|>` blocks with no `assistant` (or `cue → generate`) between them. We realized afterward this isn't a pattern any pie example uses — all canonical flows maintain strict user/assistant alternation.

### Variant isolation

To isolate whether the trigger was PR-Agent's content, prompt size, or the turn shape, we ran six variants on the same engine/model:

| Variant | Setup | Output |
| --- | --- | --- |
| 1 | neutral sys + 1 neutral user + cue | JSON ✓ |
| 2 | neutral sys + 2 neutral users + cue | JSON ✓ |
| 3 | neutral sys + 1 long (16k tok) user + cue | JSON ✓ |
| 4 | neutral sys + 1 long user + 1 short user + cue | JSON ✓ |
| 5 | **PR-Agent** sys + **PR-Agent** user + cue | JSON ✓ |
| 6 | PR-Agent sys + PR-Agent user + **1 short user** + cue | **NL ✗** |

Variant 5 (canonical alternation, real PR-Agent content) works. Variant 6 (same content + one trailing non-alternating user turn) breaks. Removing that trailing user-turn restores constraint enforcement — that's the workaround we shipped as `pr-review@0.1.2`.

### The question

Grammar constraints are documented as enforcing structural validity at every generated token, which (naively) reads as "robust to any prompt shape." On the other hand, non-alternating turn sequences are non-idiomatic and out-of-distribution for chat-tuned models, so unexpected behavior on them is at least defensible.

Possible resolutions, listed from cheapest to most invasive:

1. **Document the canonical turn pattern.** Note in the `Context` filler docs (or the JsonSchema constraint docs) that turn-fillers are designed for alternating sequences, and that constraint enforcement is most reliable on patterns matching the example inferlets.
2. **Soft warn at the SDK boundary.** Optionally log a debug message when two consecutive same-role fillers are appended.
3. **Investigate whether the constraint mechanism could be made prompt-shape-robust.** Lower priority if (1) is in place.

We're happy with (1) — the workaround is trivial once you know to look for it, and downstream we've documented the gotcha for our users. Filing this in case the observation is useful to the project.

### Reproducer

- Inferlets: `pr-review` and `pr-review-diag*` at <https://github.com/shsym/pie-pr-review> (current `pr-review@0.1.2` ships the canonical-alternation workaround).
- Trigger content: captured PR-Agent payload at `tests/e2e/payloads/2026-05-14-pr1-pr-agent.json` — feed it through the broken-code path above to reproduce.

The variant table isolates the trigger: schema, prompt size, model, and sampler are constant across variants 1–6 and only variant 6 fails.

Thanks for the great work on grammar-constrained generation — it's been a big quality-of-life win for our use case.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question: are JsonSchema constraints expected to hold under non-alternating chat turn sequences (e.g. two ctx.user() before ctx.cue())? #358

Canonical pattern (works as expected)

What we tried (and why)

Variant isolation

The question

Reproducer

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Variant	Setup	Output
1	neutral sys + 1 neutral user + cue	JSON ✓
2	neutral sys + 2 neutral users + cue	JSON ✓
3	neutral sys + 1 long (16k tok) user + cue	JSON ✓
4	neutral sys + 1 long user + 1 short user + cue	JSON ✓
5	PR-Agent sys + PR-Agent user + cue	JSON ✓
6	PR-Agent sys + PR-Agent user + 1 short user + cue	NL ✗

Uh oh!

Question: are JsonSchema constraints expected to hold under non-alternating chat turn sequences (e.g. two ctx.user() before ctx.cue())? #358

Description

Canonical pattern (works as expected)

What we tried (and why)

Variant isolation

The question

Reproducer

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions