Skip to content

feat(thunk+types): add .parsed property and wire format= overloads for cast-free structured output#1284

Open
planetf1 wants to merge 9 commits into
generative-computing:mainfrom
planetf1:worktree-issue-1274
Open

feat(thunk+types): add .parsed property and wire format= overloads for cast-free structured output#1284
planetf1 wants to merge 9 commits into
generative-computing:mainfrom
planetf1:worktree-issue-1274

Conversation

@planetf1

@planetf1 planetf1 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Fixes #1273. Fixes #1274.

The problem

Getting a typed Pydantic instance from a format= call required two error-prone manual steps:

result = m.act(Instruction("Classify sentiment"), format=Sentiment)

# Step 1 — re-name the model (type and format= can silently drift apart):
parsed = Sentiment.model_validate_json(str(result))

# Step 2 — if result hasn't been awaited, str(result) is "" and this raises a
# confusing ValidationError with no hint that the thunk was uncomputed.
print(parsed.label)

The return type was ModelOutputThunk[Any] — no narrowing, no safety net.

What this PR delivers

result = m.act(Instruction("Classify sentiment"), format=Sentiment)
#        ^^^^ now typed ComputedModelOutputThunk[Sentiment]

label = result.parsed.label   # typed Sentiment | None — runtime-safe, no cast needed

Two things had to land together to make this work end-to-end:

  1. .parsed property (ComputedModelOutputThunk) — validates self.value via model_validate_json against the format type that was used at generation time. Returns S | None. Only lives on ComputedModelOutputThunk, so it can't be called on an uncomputed thunk.

  2. format= overloads in act / aact / instruct / ainstruct (both session.py and functional.py) — when format: type[BaseModelSubclass] is passed, the return type narrows to ComputedModelOutputThunk[BaseModelSubclass]. The narrowing is observable on .parsed (typed S | None) and parsed_repr (also S | None); .value stays str unconditionally.

Together: m.act(action, format=MyModel).parsed is typed MyModel | None and runtime-correct, with no cast at the call site.

Why these two PRs were collapsed

These were originally two separate PRs (#1282 for .parsed, #1284 for the overloads). Neither was useful alone:

@jakelorocco flagged this during review. Both halves are here now.

Implementation notes

All five built-in backends set _format on the thunk in post_processing. Custom backends must set mot._format there too; the .parsed docstring has an explicit Note: for backend authors.

Downstream wrappers that forward a dynamic format value keep a # type: ignore[assignment] with inline rationale (the clean fix is branching on format is None; react.py and the m_serve example both have explanatory comments).

The parsed_repr attribute also narrows statically to S | None via the overloads, but has a runtime gap for the Instruction path: Instruction._parse currently ignores _format and returns str. Use .parsed for structured output — it is both statically typed and runtime-safe. The root cause is tracked in #1313.

Compatibility

  • .value is unchanged — existing callers are unaffected.
  • .parsed is new; nothing today relies on it.
  • The overloads are type-only — no runtime behaviour changes.

Testing

  • Unit tests: happy path, None fallback, invalid JSON, value unchanged, copy/deepcopy _format preservation
  • E2E qualitative tests: test_parsed_returns_pydantic_instance for HuggingFace and Ollama backends
  • Static assert_type checks in test/typing/ covering all four methods, the computed/uncomputed branches, and attribute-level narrowing of parsed_repr and .parsed

Test plan

  • uv run pytest test/core/test_base.py -k parsed
  • uv run pytest test/core/ test/typing/ -m "not qualitative"
  • uv run pytest test/backends/test_ollama.py -k parsed -m qualitative (requires Ollama)
  • uv run pytest test/backends/test_huggingface.py -k parsed -m qualitative (requires GPU)

@github-actions github-actions Bot added the enhancement New feature or request label Jun 17, 2026
When `format=MyModel` is passed to `act()`, `instruct()`, `aact()`, or
`ainstruct()` in both `functional.py` and `session.py`, the return type
now narrows to `ComputedModelOutputThunk[MyModel]` (or
`ModelOutputThunk[MyModel]` for the non-awaited async variant) instead
of `ComputedModelOutputThunk[str]`.

This eliminates the need for `cast(MyModel, result.value)` at call
sites. Runtime behaviour is unchanged; all overloads were already
dispatched to the same implementation body.

Changes:
- `functional.py` / `session.py` – new `@overload` stubs with
  `format: type[BaseModelSubclass]` for all four methods; implementation
  signatures broadened to `Any` to cover all overload combinations
- `test/typing/` – `assert_type` checks for the new overload resolution
  paths in all four typing-check modules
- `genstub.py` – `# type: ignore[return-value]` on four existing return
  sites that rely on the pre-narrowed `R` type variable which the new
  overloads can no longer infer
- `react.py` / `m_serve_example_response_format.py` – `# type: ignore`
  on call sites that pass a dynamic `format` value incompatible with the
  new stricter overload signatures

Closes generative-computing#1274

Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1 planetf1 force-pushed the worktree-issue-1274 branch from abec17a to 7370b83 Compare June 17, 2026 10:53
Add brief inline comments to the three type: ignore additions
introduced by the format= overload threading, explaining why each
ignore is intentional rather than masking a real issue.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1 planetf1 marked this pull request as ready for review June 17, 2026 11:35
@planetf1 planetf1 requested a review from a team as a code owner June 17, 2026 11:35
@jakelorocco

Copy link
Copy Markdown
Contributor

I read #1282 before this one. I think this encounters the same issue. We need to resolve disparate typing issues, decide what we want to have decide the output type of a mot, and then propagate the actual type to the model output thunk so that users can actually utilize it without casting / overriding / asserting a type.

@planetf1 planetf1 requested a review from ajbozarth June 19, 2026 12:51
@planetf1

Copy link
Copy Markdown
Contributor Author

@markstur @AngeloDanducci @jakelorocco @ajbozarth — this PR has been open for 2 days with no reviews yet. Happy to answer questions or discuss the approach.

@ajbozarth ajbozarth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some feedback from Claude — follow-ups inline.

Headline concern: the overloads narrow the result generic to MyModel, but at runtime no attribute on the resulting thunk is actually a MyModel instance for an Instruction(format=MyModel) call — so the type promise the PR makes can't be cashed in without a cast on the unwrap, which is exactly the footgun #1274 was trying to remove. Details inline.

Comment thread mellea/stdlib/session.py
format: type[BaseModelSubclass],
model_options: dict | None = None,
tool_calls: bool = False,
) -> ComputedModelOutputThunk[BaseModelSubclass]: ...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description's before/after says:

result, _ = act(action, ctx, backend, format=MyModel)
obj = result.value  # typed as MyModel

Under these overloads result does narrow to ComputedModelOutputThunk[MyModel], but ComputedModelOutputThunk.value is annotated -> str (mellea/core/base.py:875) — it doesn't depend on S. So result.value is still typed as str, and the only attribute the new S = MyModel actually narrows is parsed_repr: S | None.

But parsed_repr's runtime path is self._action._parse(self) (mellea/core/base.py:707), and Instruction._parse returns computed.value if computed.value is not None else "" (mellea/stdlib/components/instruction.py:236) — i.e., str, regardless of format=. So for the canonical m.act(Instruction(...), format=MyModel) pattern, result.parsed_repr is typed MyModel | None but is actually a str at runtime. pyright will accept result.parsed_repr.some_field, and it'll AttributeError at runtime — the exact silent-failure shape #1273 / #1274 were filed against, just relocated from .value to .parsed_repr.

The genstub path (where _parse calls model_validate_json) is the one place this narrowing is actually honored at runtime — and the PR has to # type: ignore the genstub return values anyway. So the PR ships a typed promise that's only kept for the one component type that already had a working unwrap.

Worth resolving with #1282's author before merge: a coherent end-state probably looks like making ComputedModelOutputThunk.parsed generic over S and returning S (so result.parsed: MyModel), backed by a runtime path that actually delivers a MyModel instance for Instruction(format=...). As-is, this PR removes the cast from the .value line at the cost of inviting one on .parsed_repr that the type system hides. Net safety is debatable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — parsed_repr has the same runtime gap, just relocated. Looking into the best fix before this merges.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime gap is real — parsed_repr routes through Instruction._parse which always returns str, so the narrowing is a lie at runtime. The fix is in #1282, which adds a .parsed property backed by model_validate_json. Once that merges, the story is clean:

result = session.act(action, ctx, backend, format=MyModel)
obj = result.parsed  # MyModel | None — typed and runtime-correct, no cast

I'll retarget the overloads here to .parsed after #1282 is merged. Moving this to draft until then.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now resolved in this PR. The collapse of #1282 into this branch means .parsed is available right here — it's backed by model_validate_json so it's both statically typed S | None and runtime-correct. Updated in 72b0e66e:

  • The session.py overload comment now names all three attributes and explicitly directs callers to .parsed (typed + runtime-safe) rather than parsed_repr (typed but runtime gap via Instruction._parse → str).
  • check_session.py now asserts assert_type(r.parsed, _M | None) at the call site, confirming the end-to-end narrowing holds.

parsed_repr's runtime gap for the Instruction path remains — fixing that requires changing _parse to be format-aware, which is a wider change left for a follow-up. The important thing is callers now have a clear, correct path via .parsed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #1313 to track the parsed_repr runtime gap for the Instruction path and explain why it wasn't fixed here. The issue covers the root cause (Instruction._parse ignores _format), why a proper fix needs Instruction to become generic, and that .parsed is the correct workaround in the meantime.

Comment thread docs/examples/m_serve/m_serve_example_response_format.py Outdated
Comment thread mellea/stdlib/components/genstub.py Outdated
Comment thread test/typing/check_session.py
Comment thread mellea/stdlib/functional.py
@planetf1 planetf1 enabled auto-merge June 22, 2026 11:42
@planetf1 planetf1 marked this pull request as draft June 22, 2026 14:37
auto-merge was automatically disabled June 22, 2026 14:37

Pull request was converted to draft

@planetf1 planetf1 force-pushed the worktree-issue-1274 branch from 52981cd to 61cb63a Compare June 22, 2026 15:07
@planetf1 planetf1 changed the title feat(types): thread format= overloads for cast-free structured output feat(thunk+types): add .parsed property and wire format= overloads for cast-free structured output Jun 22, 2026
@planetf1

Copy link
Copy Markdown
Contributor Author

@ajbozarth @jakelorocco lots of changes on this as I merged the two previously independent PRs. It's easier to see the objective and result in a single PR - so hope this is clearer

@planetf1 planetf1 marked this pull request as ready for review June 22, 2026 16:09
@planetf1 planetf1 requested a review from nrfulton as a code owner June 22, 2026 16:09
planetf1 added 7 commits June 22, 2026 17:21
Explain why the format= overloads can't narrow the genstub return type:
the overloads narrow the thunk's element type, but a genstub returns the
unwrapped inner value R, not the thunk, and parsed_repr (S | None) can't
be re-bound to R at this boundary. Add a TODO pointing at the clean shape
(ComputedModelOutputThunk[R] with the FunctionResponse[R] unwrap in a typed
parse step), noting it depends on the thunk-generics redesign out of scope here.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
The existing check only confirmed the overload resolved to
ComputedModelOutputThunk[_M]; it did not pin what the attributes are typed.
Add check_act_format_attributes asserting parsed_repr narrows to `_M | None`
(what the overloads actually narrow) and documenting that `.value` stays
unconditionally `str` — the known limitation pending the coordinated
thunk-generics / `.parsed` redesign. Locks in what IS narrowed and calls
out what isn't.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Document that the act() implementation return type is widened to `Any` on
purpose: the @overload signatures own the precise S propagation and the
format= -> BaseModelSubclass narrowing, and tightening the body to the bare
`[S]` case would conflict with the format= and sampling overloads. Callers
always resolve against an overload, never the implementation body.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
A third union overload `format: type[BaseModelSubclass] | None` cannot narrow
cleanly: it overlaps the existing `format=None` overload, so the return type
collapses to the union and the narrowing is lost. The clean fix for a wrapper
forwarding a dynamic format is to branch on `format is None` so each call
matches a narrow overload. Add comments at the react and m_serve passthrough
sites explaining this and why the ignore is preferred over branching at those
single call sites.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
The format= overloads narrow the thunk's generic element type, observable on
parsed_repr (S | None), not on .value — ComputedModelOutputThunk.value is
typed `-> str` unconditionally. parsed_repr also currently routes through
Instruction._parse (returns str), so parsed_repr.some_field type-checks but
AttributeErrors at runtime: the same silent-failure shape generative-computing#1274 set out to
fix, relocated to parsed_repr. Add a TODO pointing at the coordinated .parsed
redesign (PR generative-computing#1282) as the proper fix, out of scope here. PR body updated to
match (was claiming .value narrows to MyModel).

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…e: ignore

Serving wrapper now calls instruct() in two branches so each matches a
narrow overload. The react.py site (already inside if format is not None)
is a separate problem requiring broader restructuring; documented in-place.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
GenerativeStub._parse already unwraps FunctionResponse[R] and returns R
at runtime. The thunk types parsed_repr as S | None (S = FunctionResponse[R])
because the overloads narrow S to the format type, not R. Replace the
return-value ignores with an explicit cast to make the coercion visible.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1 planetf1 force-pushed the worktree-issue-1274 branch from 3676f7a to c56d89d Compare June 22, 2026 16:21

@jakelorocco jakelorocco left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked with Nathan about this. We want to maintain the existing behavior; which I'm assuming we don't have clearly documented and which lead to this issue/PR.

  • format can determine the shape of the output (ie what tokens are allowed, etc...)
  • a component can declare the shape of the object generated from it (ie a Message components declares that it will result in another Message)

These two are usually related (and we should maybe make it easier to express this so that if a component requires a specific schema to parse, you don't have to pass re-define / pass in separate format var). But a component can also declare arbitrary parsing rules that can operate on several formats.

If you want automatic typing on the output of the mot, you should pass in a specific component. Otherwise, you'll have to model_validate the output to get the type from the raw string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

3 participants