feat(thunk+types): add .parsed property and wire format= overloads for cast-free structured output by planetf1 · Pull Request #1284 · generative-computing/mellea

planetf1 · 2026-06-17T09:09:10Z

Fixes #1273. Fixes #1274.

The problem

Getting a typed Pydantic instance from a format= call required two error-prone manual steps:

result = m.act(Instruction("Classify sentiment"), format=Sentiment)

# Step 1 — re-name the model (type and format= can silently drift apart):
parsed = Sentiment.model_validate_json(str(result))

# Step 2 — if result hasn't been awaited, str(result) is "" and this raises a
# confusing ValidationError with no hint that the thunk was uncomputed.
print(parsed.label)

The return type was ModelOutputThunk[Any] — no narrowing, no safety net.

What this PR delivers

result = m.act(Instruction("Classify sentiment"), format=Sentiment)
#        ^^^^ now typed ComputedModelOutputThunk[Sentiment]

label = result.parsed.label   # typed Sentiment | None — runtime-safe, no cast needed

Two things had to land together to make this work end-to-end:

.parsed property (ComputedModelOutputThunk) — validates self.value via model_validate_json against the format type that was used at generation time. Returns S | None. Only lives on ComputedModelOutputThunk, so it can't be called on an uncomputed thunk.
format= overloads in act / aact / instruct / ainstruct (both session.py and functional.py) — when format: type[BaseModelSubclass] is passed, the return type narrows to ComputedModelOutputThunk[BaseModelSubclass]. The narrowing is observable on .parsed (typed S | None) and parsed_repr (also S | None); .value stays str unconditionally.

Together: m.act(action, format=MyModel).parsed is typed MyModel | None and runtime-correct, with no cast at the call site.

Why these two PRs were collapsed

These were originally two separate PRs (#1282 for .parsed, #1284 for the overloads). Neither was useful alone:

feat(thunk): add .parsed property to ComputedModelOutputThunk for structured output #1282 alone — .parsed exists at runtime but is typed str | None at every call site because the overloads don't yet bind S. Callers still need a cast.
feat(thunk+types): add .parsed property and wire format= overloads for cast-free structured output #1284 alone — m.act(format=MyModel) narrows to ComputedModelOutputThunk[MyModel], but there is no .parsed to use that narrowing. Callers fall back to cast(MyModel, result.value), which raises AttributeError at runtime.

@jakelorocco flagged this during review. Both halves are here now.

Implementation notes

All five built-in backends set _format on the thunk in post_processing. Custom backends must set mot._format there too; the .parsed docstring has an explicit Note: for backend authors.

Downstream wrappers that forward a dynamic format value keep a # type: ignore[assignment] with inline rationale (the clean fix is branching on format is None; react.py and the m_serve example both have explanatory comments).

The parsed_repr attribute also narrows statically to S | None via the overloads, but has a runtime gap for the Instruction path: Instruction._parse currently ignores _format and returns str. Use .parsed for structured output — it is both statically typed and runtime-safe. The root cause is tracked in #1313.

Compatibility

.value is unchanged — existing callers are unaffected.
.parsed is new; nothing today relies on it.
The overloads are type-only — no runtime behaviour changes.

Testing

Unit tests: happy path, None fallback, invalid JSON, value unchanged, copy/deepcopy _format preservation
E2E qualitative tests: test_parsed_returns_pydantic_instance for HuggingFace and Ollama backends
Static assert_type checks in test/typing/ covering all four methods, the computed/uncomputed branches, and attribute-level narrowing of parsed_repr and .parsed

Test plan

uv run pytest test/core/test_base.py -k parsed
uv run pytest test/core/ test/typing/ -m "not qualitative"
uv run pytest test/backends/test_ollama.py -k parsed -m qualitative (requires Ollama)
uv run pytest test/backends/test_huggingface.py -k parsed -m qualitative (requires GPU)

When `format=MyModel` is passed to `act()`, `instruct()`, `aact()`, or `ainstruct()` in both `functional.py` and `session.py`, the return type now narrows to `ComputedModelOutputThunk[MyModel]` (or `ModelOutputThunk[MyModel]` for the non-awaited async variant) instead of `ComputedModelOutputThunk[str]`. This eliminates the need for `cast(MyModel, result.value)` at call sites. Runtime behaviour is unchanged; all overloads were already dispatched to the same implementation body. Changes: - `functional.py` / `session.py` – new `@overload` stubs with `format: type[BaseModelSubclass]` for all four methods; implementation signatures broadened to `Any` to cover all overload combinations - `test/typing/` – `assert_type` checks for the new overload resolution paths in all four typing-check modules - `genstub.py` – `# type: ignore[return-value]` on four existing return sites that rely on the pre-narrowed `R` type variable which the new overloads can no longer infer - `react.py` / `m_serve_example_response_format.py` – `# type: ignore` on call sites that pass a dynamic `format` value incompatible with the new stricter overload signatures Closes generative-computing#1274 Signed-off-by: Nigel Jones <jonesn@uk.ibm.com> Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

Add brief inline comments to the three type: ignore additions introduced by the format= overload threading, explaining why each ignore is intentional rather than masking a real issue. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

jakelorocco · 2026-06-17T12:49:46Z

I read #1282 before this one. I think this encounters the same issue. We need to resolve disparate typing issues, decide what we want to have decide the output type of a mot, and then propagate the actual type to the model output thunk so that users can actually utilize it without casting / overriding / asserting a type.

planetf1 · 2026-06-19T12:52:21Z

@markstur @AngeloDanducci @jakelorocco @ajbozarth — this PR has been open for 2 days with no reviews yet. Happy to answer questions or discuss the approach.

ajbozarth

Some feedback from Claude — follow-ups inline.

Headline concern: the overloads narrow the result generic to MyModel, but at runtime no attribute on the resulting thunk is actually a MyModel instance for an Instruction(format=MyModel) call — so the type promise the PR makes can't be cashed in without a cast on the unwrap, which is exactly the footgun #1274 was trying to remove. Details inline.

ajbozarth · 2026-06-19T17:42:38Z

+        format: type[BaseModelSubclass],
+        model_options: dict | None = None,
+        tool_calls: bool = False,
+    ) -> ComputedModelOutputThunk[BaseModelSubclass]: ...


The PR description's before/after says:

result, _ = act(action, ctx, backend, format=MyModel) obj = result.value # typed as MyModel

Under these overloads result does narrow to ComputedModelOutputThunk[MyModel], but ComputedModelOutputThunk.value is annotated -> str (mellea/core/base.py:875) — it doesn't depend on S. So result.value is still typed as str, and the only attribute the new S = MyModel actually narrows is parsed_repr: S | None.

But parsed_repr's runtime path is self._action._parse(self) (mellea/core/base.py:707), and Instruction._parse returns computed.value if computed.value is not None else "" (mellea/stdlib/components/instruction.py:236) — i.e., str, regardless of format=. So for the canonical m.act(Instruction(...), format=MyModel) pattern, result.parsed_repr is typed MyModel | None but is actually a str at runtime. pyright will accept result.parsed_repr.some_field, and it'll AttributeError at runtime — the exact silent-failure shape #1273 / #1274 were filed against, just relocated from .value to .parsed_repr.

The genstub path (where _parse calls model_validate_json) is the one place this narrowing is actually honored at runtime — and the PR has to # type: ignore the genstub return values anyway. So the PR ships a typed promise that's only kept for the one component type that already had a working unwrap.

Worth resolving with #1282's author before merge: a coherent end-state probably looks like making ComputedModelOutputThunk.parsed generic over S and returning S (so result.parsed: MyModel), backed by a runtime path that actually delivers a MyModel instance for Instruction(format=...). As-is, this PR removes the cast from the .value line at the cost of inviting one on .parsed_repr that the type system hides. Net safety is debatable.

You're right — parsed_repr has the same runtime gap, just relocated. Looking into the best fix before this merges.

The runtime gap is real — parsed_repr routes through Instruction._parse which always returns str, so the narrowing is a lie at runtime. The fix is in #1282, which adds a .parsed property backed by model_validate_json. Once that merges, the story is clean:

result = session.act(action, ctx, backend, format=MyModel) obj = result.parsed # MyModel | None — typed and runtime-correct, no cast

I'll retarget the overloads here to .parsed after #1282 is merged. Moving this to draft until then.

Now resolved in this PR. The collapse of #1282 into this branch means .parsed is available right here — it's backed by model_validate_json so it's both statically typed S | None and runtime-correct. Updated in 72b0e66e:

The session.py overload comment now names all three attributes and explicitly directs callers to .parsed (typed + runtime-safe) rather than parsed_repr (typed but runtime gap via Instruction._parse → str).

check_session.py now asserts assert_type(r.parsed, _M | None) at the call site, confirming the end-to-end narrowing holds.

parsed_repr's runtime gap for the Instruction path remains — fixing that requires changing _parse to be format-aware, which is a wider change left for a follow-up. The important thing is callers now have a clear, correct path via .parsed.

Filed #1313 to track the parsed_repr runtime gap for the Instruction path and explain why it wasn't fixed here. The issue covers the root cause (Instruction._parse ignores _format), why a proper fix needs Instruction to become generic, and that .parsed is the correct workaround in the meantime.

planetf1 · 2026-06-22T16:09:39Z

@ajbozarth @jakelorocco lots of changes on this as I merged the two previously independent PRs. It's easier to see the objective and result in a single PR - so hope this is clearer

Explain why the format= overloads can't narrow the genstub return type: the overloads narrow the thunk's element type, but a genstub returns the unwrapped inner value R, not the thunk, and parsed_repr (S | None) can't be re-bound to R at this boundary. Add a TODO pointing at the clean shape (ComputedModelOutputThunk[R] with the FunctionResponse[R] unwrap in a typed parse step), noting it depends on the thunk-generics redesign out of scope here. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

The existing check only confirmed the overload resolved to ComputedModelOutputThunk[_M]; it did not pin what the attributes are typed. Add check_act_format_attributes asserting parsed_repr narrows to `_M | None` (what the overloads actually narrow) and documenting that `.value` stays unconditionally `str` — the known limitation pending the coordinated thunk-generics / `.parsed` redesign. Locks in what IS narrowed and calls out what isn't. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

@overload

Document that the act() implementation return type is widened to `Any` on purpose: the @overload signatures own the precise S propagation and the format= -> BaseModelSubclass narrowing, and tightening the body to the bare `[S]` case would conflict with the format= and sampling overloads. Callers always resolve against an overload, never the implementation body. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

A third union overload `format: type[BaseModelSubclass] | None` cannot narrow cleanly: it overlaps the existing `format=None` overload, so the return type collapses to the union and the narrowing is lost. The clean fix for a wrapper forwarding a dynamic format is to branch on `format is None` so each call matches a narrow overload. Add comments at the react and m_serve passthrough sites explaining this and why the ignore is preferred over branching at those single call sites. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

The format= overloads narrow the thunk's generic element type, observable on parsed_repr (S | None), not on .value — ComputedModelOutputThunk.value is typed `-> str` unconditionally. parsed_repr also currently routes through Instruction._parse (returns str), so parsed_repr.some_field type-checks but AttributeErrors at runtime: the same silent-failure shape generative-computing#1274 set out to fix, relocated to parsed_repr. Add a TODO pointing at the coordinated .parsed redesign (PR generative-computing#1282) as the proper fix, out of scope here. PR body updated to match (was claiming .value narrows to MyModel). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

…e: ignore Serving wrapper now calls instruct() in two branches so each matches a narrow overload. The react.py site (already inside if format is not None) is a separate problem requiring broader restructuring; documented in-place. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

GenerativeStub._parse already unwraps FunctionResponse[R] and returns R at runtime. The thunk types parsed_repr as S | None (S = FunctionResponse[R]) because the overloads narrow S to the format type, not R. Replace the return-value ignores with an explicit cast to make the coercion visible. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

jakelorocco

I talked with Nathan about this. We want to maintain the existing behavior; which I'm assuming we don't have clearly documented and which lead to this issue/PR.

format can determine the shape of the output (ie what tokens are allowed, etc...)
a component can declare the shape of the object generated from it (ie a Message components declares that it will result in another Message)

These two are usually related (and we should maybe make it easier to express this so that if a component requires a specific schema to parse, you don't have to pass re-define / pass in separate format var). But a component can also declare arbitrary parsing rules that can operate on several formats.

If you want automatic typing on the output of the mot, you should pass in a specific component. Otherwise, you'll have to model_validate the output to get the type from the raw string.

github-actions Bot added the enhancement New feature or request label Jun 17, 2026

planetf1 force-pushed the worktree-issue-1274 branch from abec17a to 7370b83 Compare June 17, 2026 10:53

planetf1 marked this pull request as ready for review June 17, 2026 11:35

planetf1 requested a review from a team as a code owner June 17, 2026 11:35

planetf1 requested review from AngeloDanducci, jakelorocco and markstur June 17, 2026 11:35

planetf1 requested a review from ajbozarth June 19, 2026 12:51

ajbozarth requested changes Jun 19, 2026

View reviewed changes

planetf1 enabled auto-merge June 22, 2026 11:42

planetf1 marked this pull request as draft June 22, 2026 14:37

auto-merge was automatically disabled June 22, 2026 14:37
Pull request was converted to draft

planetf1 force-pushed the worktree-issue-1274 branch from 52981cd to 61cb63a Compare June 22, 2026 15:07

planetf1 changed the title ~~feat(types): thread format= overloads for cast-free structured output~~ feat(thunk+types): add .parsed property and wire format= overloads for cast-free structured output Jun 22, 2026

This was referenced Jun 22, 2026

feat(thunk): add .parsed property to ComputedModelOutputThunk for structured output #1282

Closed

bug: parsed_repr has runtime/type mismatch for Instruction when format= is set #1313

Open

planetf1 marked this pull request as ready for review June 22, 2026 16:09

planetf1 requested a review from nrfulton as a code owner June 22, 2026 16:09

planetf1 added 7 commits June 22, 2026 17:21

planetf1 force-pushed the worktree-issue-1274 branch from 3676f7a to c56d89d Compare June 22, 2026 16:21

jakelorocco requested changes Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(thunk+types): add .parsed property and wire format= overloads for cast-free structured output#1284

feat(thunk+types): add .parsed property and wire format= overloads for cast-free structured output#1284
planetf1 wants to merge 9 commits into
generative-computing:mainfrom
planetf1:worktree-issue-1274

planetf1 commented Jun 17, 2026 •

edited

Loading

Uh oh!

jakelorocco commented Jun 17, 2026

Uh oh!

planetf1 commented Jun 19, 2026

Uh oh!

ajbozarth left a comment

Uh oh!

ajbozarth Jun 19, 2026

Uh oh!

planetf1 Jun 22, 2026

Uh oh!

planetf1 Jun 22, 2026

Uh oh!

planetf1 Jun 22, 2026

Uh oh!

planetf1 Jun 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

planetf1 commented Jun 22, 2026

Uh oh!

jakelorocco left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

planetf1 commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The problem

What this PR delivers

Why these two PRs were collapsed

Implementation notes

Compatibility

Testing

Test plan

Uh oh!

jakelorocco commented Jun 17, 2026

Uh oh!

planetf1 commented Jun 19, 2026

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

ajbozarth Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

planetf1 commented Jun 22, 2026

Uh oh!

jakelorocco left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

planetf1 commented Jun 17, 2026 •

edited

Loading