feat(thunk+types): add .parsed property and wire format= overloads for cast-free structured output#1284
feat(thunk+types): add .parsed property and wire format= overloads for cast-free structured output#1284planetf1 wants to merge 9 commits into
Conversation
When `format=MyModel` is passed to `act()`, `instruct()`, `aact()`, or `ainstruct()` in both `functional.py` and `session.py`, the return type now narrows to `ComputedModelOutputThunk[MyModel]` (or `ModelOutputThunk[MyModel]` for the non-awaited async variant) instead of `ComputedModelOutputThunk[str]`. This eliminates the need for `cast(MyModel, result.value)` at call sites. Runtime behaviour is unchanged; all overloads were already dispatched to the same implementation body. Changes: - `functional.py` / `session.py` – new `@overload` stubs with `format: type[BaseModelSubclass]` for all four methods; implementation signatures broadened to `Any` to cover all overload combinations - `test/typing/` – `assert_type` checks for the new overload resolution paths in all four typing-check modules - `genstub.py` – `# type: ignore[return-value]` on four existing return sites that rely on the pre-narrowed `R` type variable which the new overloads can no longer infer - `react.py` / `m_serve_example_response_format.py` – `# type: ignore` on call sites that pass a dynamic `format` value incompatible with the new stricter overload signatures Closes generative-computing#1274 Signed-off-by: Nigel Jones <jonesn@uk.ibm.com> Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
abec17a to
7370b83
Compare
Add brief inline comments to the three type: ignore additions introduced by the format= overload threading, explaining why each ignore is intentional rather than masking a real issue. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
|
I read #1282 before this one. I think this encounters the same issue. We need to resolve disparate typing issues, decide what we want to have decide the output type of a mot, and then propagate the actual type to the model output thunk so that users can actually utilize it without casting / overriding / asserting a type. |
|
@markstur @AngeloDanducci @jakelorocco @ajbozarth — this PR has been open for 2 days with no reviews yet. Happy to answer questions or discuss the approach. |
ajbozarth
left a comment
There was a problem hiding this comment.
Some feedback from Claude — follow-ups inline.
Headline concern: the overloads narrow the result generic to MyModel, but at runtime no attribute on the resulting thunk is actually a MyModel instance for an Instruction(format=MyModel) call — so the type promise the PR makes can't be cashed in without a cast on the unwrap, which is exactly the footgun #1274 was trying to remove. Details inline.
| format: type[BaseModelSubclass], | ||
| model_options: dict | None = None, | ||
| tool_calls: bool = False, | ||
| ) -> ComputedModelOutputThunk[BaseModelSubclass]: ... |
There was a problem hiding this comment.
The PR description's before/after says:
result, _ = act(action, ctx, backend, format=MyModel)
obj = result.value # typed as MyModelUnder these overloads result does narrow to ComputedModelOutputThunk[MyModel], but ComputedModelOutputThunk.value is annotated -> str (mellea/core/base.py:875) — it doesn't depend on S. So result.value is still typed as str, and the only attribute the new S = MyModel actually narrows is parsed_repr: S | None.
But parsed_repr's runtime path is self._action._parse(self) (mellea/core/base.py:707), and Instruction._parse returns computed.value if computed.value is not None else "" (mellea/stdlib/components/instruction.py:236) — i.e., str, regardless of format=. So for the canonical m.act(Instruction(...), format=MyModel) pattern, result.parsed_repr is typed MyModel | None but is actually a str at runtime. pyright will accept result.parsed_repr.some_field, and it'll AttributeError at runtime — the exact silent-failure shape #1273 / #1274 were filed against, just relocated from .value to .parsed_repr.
The genstub path (where _parse calls model_validate_json) is the one place this narrowing is actually honored at runtime — and the PR has to # type: ignore the genstub return values anyway. So the PR ships a typed promise that's only kept for the one component type that already had a working unwrap.
Worth resolving with #1282's author before merge: a coherent end-state probably looks like making ComputedModelOutputThunk.parsed generic over S and returning S (so result.parsed: MyModel), backed by a runtime path that actually delivers a MyModel instance for Instruction(format=...). As-is, this PR removes the cast from the .value line at the cost of inviting one on .parsed_repr that the type system hides. Net safety is debatable.
There was a problem hiding this comment.
You're right — parsed_repr has the same runtime gap, just relocated. Looking into the best fix before this merges.
There was a problem hiding this comment.
The runtime gap is real — parsed_repr routes through Instruction._parse which always returns str, so the narrowing is a lie at runtime. The fix is in #1282, which adds a .parsed property backed by model_validate_json. Once that merges, the story is clean:
result = session.act(action, ctx, backend, format=MyModel)
obj = result.parsed # MyModel | None — typed and runtime-correct, no castI'll retarget the overloads here to .parsed after #1282 is merged. Moving this to draft until then.
There was a problem hiding this comment.
Now resolved in this PR. The collapse of #1282 into this branch means .parsed is available right here — it's backed by model_validate_json so it's both statically typed S | None and runtime-correct. Updated in 72b0e66e:
- The
session.pyoverload comment now names all three attributes and explicitly directs callers to.parsed(typed + runtime-safe) rather thanparsed_repr(typed but runtime gap viaInstruction._parse → str). check_session.pynow assertsassert_type(r.parsed, _M | None)at the call site, confirming the end-to-end narrowing holds.
parsed_repr's runtime gap for the Instruction path remains — fixing that requires changing _parse to be format-aware, which is a wider change left for a follow-up. The important thing is callers now have a clear, correct path via .parsed.
There was a problem hiding this comment.
Filed #1313 to track the parsed_repr runtime gap for the Instruction path and explain why it wasn't fixed here. The issue covers the root cause (Instruction._parse ignores _format), why a proper fix needs Instruction to become generic, and that .parsed is the correct workaround in the meantime.
Pull request was converted to draft
52981cd to
61cb63a
Compare
|
@ajbozarth @jakelorocco lots of changes on this as I merged the two previously independent PRs. It's easier to see the objective and result in a single PR - so hope this is clearer |
Explain why the format= overloads can't narrow the genstub return type: the overloads narrow the thunk's element type, but a genstub returns the unwrapped inner value R, not the thunk, and parsed_repr (S | None) can't be re-bound to R at this boundary. Add a TODO pointing at the clean shape (ComputedModelOutputThunk[R] with the FunctionResponse[R] unwrap in a typed parse step), noting it depends on the thunk-generics redesign out of scope here. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
The existing check only confirmed the overload resolved to ComputedModelOutputThunk[_M]; it did not pin what the attributes are typed. Add check_act_format_attributes asserting parsed_repr narrows to `_M | None` (what the overloads actually narrow) and documenting that `.value` stays unconditionally `str` — the known limitation pending the coordinated thunk-generics / `.parsed` redesign. Locks in what IS narrowed and calls out what isn't. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Document that the act() implementation return type is widened to `Any` on purpose: the @overload signatures own the precise S propagation and the format= -> BaseModelSubclass narrowing, and tightening the body to the bare `[S]` case would conflict with the format= and sampling overloads. Callers always resolve against an overload, never the implementation body. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
A third union overload `format: type[BaseModelSubclass] | None` cannot narrow cleanly: it overlaps the existing `format=None` overload, so the return type collapses to the union and the narrowing is lost. The clean fix for a wrapper forwarding a dynamic format is to branch on `format is None` so each call matches a narrow overload. Add comments at the react and m_serve passthrough sites explaining this and why the ignore is preferred over branching at those single call sites. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
The format= overloads narrow the thunk's generic element type, observable on parsed_repr (S | None), not on .value — ComputedModelOutputThunk.value is typed `-> str` unconditionally. parsed_repr also currently routes through Instruction._parse (returns str), so parsed_repr.some_field type-checks but AttributeErrors at runtime: the same silent-failure shape generative-computing#1274 set out to fix, relocated to parsed_repr. Add a TODO pointing at the coordinated .parsed redesign (PR generative-computing#1282) as the proper fix, out of scope here. PR body updated to match (was claiming .value narrows to MyModel). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…e: ignore Serving wrapper now calls instruct() in two branches so each matches a narrow overload. The react.py site (already inside if format is not None) is a separate problem requiring broader restructuring; documented in-place. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
GenerativeStub._parse already unwraps FunctionResponse[R] and returns R at runtime. The thunk types parsed_repr as S | None (S = FunctionResponse[R]) because the overloads narrow S to the format type, not R. Replace the return-value ignores with an explicit cast to make the coercion visible. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
3676f7a to
c56d89d
Compare
jakelorocco
left a comment
There was a problem hiding this comment.
I talked with Nathan about this. We want to maintain the existing behavior; which I'm assuming we don't have clearly documented and which lead to this issue/PR.
formatcan determine the shape of the output (ie what tokens are allowed, etc...)- a component can declare the shape of the object generated from it (ie a Message components declares that it will result in another Message)
These two are usually related (and we should maybe make it easier to express this so that if a component requires a specific schema to parse, you don't have to pass re-define / pass in separate format var). But a component can also declare arbitrary parsing rules that can operate on several formats.
If you want automatic typing on the output of the mot, you should pass in a specific component. Otherwise, you'll have to model_validate the output to get the type from the raw string.
Fixes #1273. Fixes #1274.
The problem
Getting a typed Pydantic instance from a
format=call required two error-prone manual steps:The return type was
ModelOutputThunk[Any]— no narrowing, no safety net.What this PR delivers
Two things had to land together to make this work end-to-end:
.parsedproperty (ComputedModelOutputThunk) — validatesself.valueviamodel_validate_jsonagainst the format type that was used at generation time. ReturnsS | None. Only lives onComputedModelOutputThunk, so it can't be called on an uncomputed thunk.format=overloads inact/aact/instruct/ainstruct(bothsession.pyandfunctional.py) — whenformat: type[BaseModelSubclass]is passed, the return type narrows toComputedModelOutputThunk[BaseModelSubclass]. The narrowing is observable on.parsed(typedS | None) andparsed_repr(alsoS | None);.valuestaysstrunconditionally.Together:
m.act(action, format=MyModel).parsedis typedMyModel | Noneand runtime-correct, with no cast at the call site.Why these two PRs were collapsed
These were originally two separate PRs (#1282 for
.parsed, #1284 for the overloads). Neither was useful alone:.parsedexists at runtime but is typedstr | Noneat every call site because the overloads don't yet bindS. Callers still need a cast.m.act(format=MyModel)narrows toComputedModelOutputThunk[MyModel], but there is no.parsedto use that narrowing. Callers fall back tocast(MyModel, result.value), which raisesAttributeErrorat runtime.@jakelorocco flagged this during review. Both halves are here now.
Implementation notes
All five built-in backends set
_formaton the thunk inpost_processing. Custom backends must setmot._formatthere too; the.parseddocstring has an explicitNote:for backend authors.Downstream wrappers that forward a dynamic
formatvalue keep a# type: ignore[assignment]with inline rationale (the clean fix is branching onformat is None; react.py and the m_serve example both have explanatory comments).The
parsed_reprattribute also narrows statically toS | Nonevia the overloads, but has a runtime gap for theInstructionpath:Instruction._parsecurrently ignores_formatand returnsstr. Use.parsedfor structured output — it is both statically typed and runtime-safe. The root cause is tracked in #1313.Compatibility
.valueis unchanged — existing callers are unaffected..parsedis new; nothing today relies on it.Testing
Nonefallback, invalid JSON, value unchanged, copy/deepcopy_formatpreservationtest_parsed_returns_pydantic_instancefor HuggingFace and Ollama backendsassert_typechecks intest/typing/covering all four methods, the computed/uncomputed branches, and attribute-level narrowing ofparsed_reprand.parsedTest plan
uv run pytest test/core/test_base.py -k parseduv run pytest test/core/ test/typing/ -m "not qualitative"uv run pytest test/backends/test_ollama.py -k parsed -m qualitative(requires Ollama)uv run pytest test/backends/test_huggingface.py -k parsed -m qualitative(requires GPU)