feat(thunk): add .parsed property to ComputedModelOutputThunk for structured output#1282
feat(thunk): add .parsed property to ComputedModelOutputThunk for structured output#1282planetf1 wants to merge 8 commits into
Conversation
…uctured output When `format=` is passed to `act()`/`instruct()`, the model returns a JSON string and `.value` has always held that raw JSON — not a Pydantic instance. Accessing `.label` (etc.) on `.value` silently raises `AttributeError` at runtime while pyright accepts the cast without complaint, leading to hard-to-debug silent failures. This commit adds: - `_format: type[pydantic.BaseModel] | None` attribute on `ModelOutputThunk` (initialised to `None`; propagated via `_copy_from`) - All five backends (`ollama`, `litellm`, `openai`, `huggingface`, `watsonx`) now set `mot._format = _format` in `post_processing()`, alongside the existing `generate_log.extra` artefact - `ComputedModelOutputThunk.parsed` property — calls `_format.model_validate_json(value)` when a format type is stored, returns `None` otherwise - Docstring updates on `ModelOutputThunk.value` and `Session.act()` pointing callers to `.parsed` when `format=` is used - Four unit tests covering: happy path, no-format returns None, invalid JSON raises `pydantic.ValidationError`, and `.value` is unaffected Closes generative-computing#1273. Signed-off-by: Nigel Jones <jonesn@uk.ibm.com> Assisted-by: Claude Code
b1f9251 to
acb02f9
Compare
…es: to parsed - Add `_format = self._format` to `__copy__` and `__deepcopy__` so that copying a ComputedModelOutputThunk preserves the format type; previously a copied thunk would silently return None from .parsed even when the original had a format set. - Add `Raises: pydantic.ValidationError` to the `parsed` property docstring to document the exception callers must handle when the model returns malformed structured output. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
"no manual model_validate_json needed" is more accurate than "MyModel instance, no cast needed" — .parsed returns BaseModel | None, so static type narrowing still requires a cast; the value is just already deserialized. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
|
I think we should reconcile this change with the existing It seems to me that this change saves the user the hassle of validating the model but doesn't actually give them typing information since the parsed field returns a generic pydantic.BaseModel type. One solution is to propagate the pydantic type through the function signature to the returned model output thunk (which we do for action types right now), assuming we make the disparate typing options coherent. |
|
@nrfulton @jakelorocco @ajbozarth — this PR has been open for 2 days with no reviews yet. Happy to answer questions. |
ajbozarth
left a comment
There was a problem hiding this comment.
Some feedback from Claude — follow-ups inline.
|
|
The HuggingFace chat-path post_processing never assigned mot._format, so .parsed always returned None when format= was set via LocalHFBackend. All other backends (ollama, openai, litellm, watsonx) already set it. Also adds: - Copy/deepcopy unit tests verifying _format is preserved across copies - E2e tests in test_ollama and test_huggingface asserting .parsed returns a typed Pydantic instance end-to-end through each backend - Docstring note on ComputedModelOutputThunk.parsed warning custom-backend authors to set mot._format in their post_processing method Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
The format= overloads narrow the thunk's generic element type, observable on parsed_repr (S | None), not on .value — ComputedModelOutputThunk.value is typed `-> str` unconditionally. parsed_repr also currently routes through Instruction._parse (returns str), so parsed_repr.some_field type-checks but AttributeErrors at runtime: the same silent-failure shape generative-computing#1274 set out to fix, relocated to parsed_repr. Add a TODO pointing at the coordinated .parsed redesign (PR generative-computing#1282) as the proper fix, out of scope here. PR body updated to match (was claiming .value narrows to MyModel). Assisted-by: Claude Code
…mat type `.parsed` previously returned `pydantic.BaseModel | None`, so callers still needed `cast(MyModel, result.parsed)` for static narrowing — the gap @ajbozarth flagged on PR generative-computing#1282. Thread the format type through the thunk's existing type parameter `S`: `_format` is now `type[S] | None` and `.parsed` returns `S | None`. Reusing `S` (rather than a second TypeVar) composes with the companion `format=` overloads on generative-computing#1274, which bind `S` to the supplied model so `m.act(action, format=MyModel)` yields `ComputedModelOutputThunk[MyModel]` and `.parsed` is typed `MyModel | None`. The `.parsed` body narrows `_format` to a pydantic type to call `model_validate_json`, then re-asserts the result as `S` — `S` is unbounded (it is `str` for plain instructions) so neither cast can be elided. Add `test/typing/check_parsed.py` asserting `.parsed` tracks the type parameter for both a model-parameterized and a `str` thunk. Assisted-by: Claude Code
…mat type `.parsed` previously returned `pydantic.BaseModel | None`, so callers still needed `cast(MyModel, result.parsed)` for static narrowing — the gap @ajbozarth flagged on PR generative-computing#1282. Thread the format type through the thunk's existing type parameter `S`: `_format` is now `type[S] | None` and `.parsed` returns `S | None`. Reusing `S` (rather than a second TypeVar) composes with the companion `format=` overloads on generative-computing#1274, which bind `S` to the supplied model so `m.act(action, format=MyModel)` yields `ComputedModelOutputThunk[MyModel]` and `.parsed` is typed `MyModel | None`. The `.parsed` body narrows `_format` to a pydantic type to call `model_validate_json`, then re-asserts the result as `S` — `S` is unbounded (it is `str` for plain instructions) so neither cast can be elided. Add `test/typing/check_parsed.py` asserting `.parsed` tracks the type parameter for both a model-parameterized and a `str` thunk. Assisted-by: Claude Code
…mat type `.parsed` previously returned `pydantic.BaseModel | None`, so callers still needed `cast(MyModel, result.parsed)` for static narrowing — the gap @ajbozarth flagged on PR generative-computing#1282. Thread the format type through the thunk's existing type parameter `S`: `_format` is now `type[S] | None` and `.parsed` returns `S | None`. Reusing `S` (rather than a second TypeVar) composes with the companion `format=` overloads on generative-computing#1274, which bind `S` to the supplied model so `m.act(action, format=MyModel)` yields `ComputedModelOutputThunk[MyModel]` and `.parsed` is typed `MyModel | None`. The `.parsed` body narrows `_format` to a pydantic type to call `model_validate_json`, then re-asserts the result as `S` — `S` is unbounded (it is `str` for plain instructions) so neither cast can be elided. Add `test/typing/check_parsed.py` asserting `.parsed` tracks the type parameter for both a model-parameterized and a `str` thunk. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
… compat Two pyright-compatibility fixes for the `.parsed` property added in bb53ddb: 1. `_format` annotation: revert from `type[S] | None` to `type[pydantic.BaseModel] | None`. Using a covariant TypeVar (`S`) in the invariant `type[...]` position is semantically unsound and can confuse stricter pyright configurations. The field only ever holds pydantic model types at runtime; the concrete annotation is accurate and avoids the variance issue entirely. 2. `.parsed` body: replace the two-step string-quoted cast (`cast("type[pydantic.BaseModel]", …)` then `cast("S", …)`) with a single direct cast (`cast(S, self._format.model_validate_json(self.value))`). Pyright resolves TypeVar forward-references in cast strings differently across versions; using the TypeVar directly is unambiguous. 3. `check_parsed.py`: use `cast(X, cast(object, None))` instead of `cast(X, None)` to avoid basedpyright's `reportInvalidCast` diagnostic (None and X share no overlap); assign `assert_type(…)` results to `_` to silence `reportUnusedCallResult`. All three checkers (mypy, pyright 1.1.408+, basedpyright) now report clean on both changed files. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
5e969cf to
6118cea
Compare
…mat type `.parsed` previously returned `pydantic.BaseModel | None`, so callers still needed `cast(MyModel, result.parsed)` for static narrowing — the gap @ajbozarth flagged on PR generative-computing#1282. Thread the format type through the thunk's existing type parameter `S`: `_format` is now `type[S] | None` and `.parsed` returns `S | None`. Reusing `S` (rather than a second TypeVar) composes with the companion `format=` overloads on generative-computing#1274, which bind `S` to the supplied model so `m.act(action, format=MyModel)` yields `ComputedModelOutputThunk[MyModel]` and `.parsed` is typed `MyModel | None`. The `.parsed` body narrows `_format` to a pydantic type to call `model_validate_json`, then re-asserts the result as `S` — `S` is unbounded (it is `str` for plain instructions) so neither cast can be elided. Add `test/typing/check_parsed.py` asserting `.parsed` tracks the type parameter for both a model-parameterized and a `str` thunk. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…rings - ComputedModelOutputThunk.value now carries the same raw-JSON guidance as the parent override so callers inspecting the subclass see it directly. - .parsed opening paragraph no longer overstates current type inference: the format= overloads do not yet bind S to the format model, so the cast idiom is required; removed the false claim that m.act(format=MyModel) yields a typed thunk without a cast. - Added one-line distinction from parsed_repr to prevent confusion between the two properties. Signed-off-by: Nigel Jones <jonesn@uk.ibm.com> Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…mat type `.parsed` previously returned `pydantic.BaseModel | None`, so callers still needed `cast(MyModel, result.parsed)` for static narrowing — the gap @ajbozarth flagged on PR generative-computing#1282. Thread the format type through the thunk's existing type parameter `S`: `_format` is now `type[S] | None` and `.parsed` returns `S | None`. Reusing `S` (rather than a second TypeVar) composes with the companion `format=` overloads on generative-computing#1274, which bind `S` to the supplied model so `m.act(action, format=MyModel)` yields `ComputedModelOutputThunk[MyModel]` and `.parsed` is typed `MyModel | None`. The `.parsed` body narrows `_format` to a pydantic type to call `model_validate_json`, then re-asserts the result as `S` — `S` is unbounded (it is `str` for plain instructions) so neither cast can be elided. Add `test/typing/check_parsed.py` asserting `.parsed` tracks the type parameter for both a model-parameterized and a `str` thunk. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
The format= overloads narrow the thunk's generic element type, observable on parsed_repr (S | None), not on .value — ComputedModelOutputThunk.value is typed `-> str` unconditionally. parsed_repr also currently routes through Instruction._parse (returns str), so parsed_repr.some_field type-checks but AttributeErrors at runtime: the same silent-failure shape generative-computing#1274 set out to fix, relocated to parsed_repr. Add a TODO pointing at the coordinated .parsed redesign (PR generative-computing#1282) as the proper fix, out of scope here. PR body updated to match (was claiming .value narrows to MyModel). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
|
Closing in favour of #1284, which combines this PR with the companion overload-binding work from #1274. During review @jakelorocco correctly noted that without the All commits from this branch are included in #1284 (rebased on top of upstream/main). The review history and resolved threads from this PR are referenced there. A backup of this branch is preserved at |
Pull request was closed
…re in one PR - session.py overload comment now explicitly names all three attributes (.parsed = typed + runtime-correct via model_validate_json, .parsed_repr = statically typed but runtime gap via Instruction._parse → str, .value = str unconditionally) and directs callers to .parsed for structured output. - check_session.py: add assert_type(r.parsed, _M | None) alongside the existing parsed_repr assertion; update the KNOWN LIMITATION comment now that .parsed is in this same PR rather than deferred to generative-computing#1282. - check_parsed.py: update module docstring to reflect that the format= overloads are now in this PR, closing the "end-to-end" gap previously noted as pending. Signed-off-by: Nigel Jones <jonesn@uk.ibm.com> Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
The format= overloads narrow the thunk's generic element type, observable on parsed_repr (S | None), not on .value — ComputedModelOutputThunk.value is typed `-> str` unconditionally. parsed_repr also currently routes through Instruction._parse (returns str), so parsed_repr.some_field type-checks but AttributeErrors at runtime: the same silent-failure shape generative-computing#1274 set out to fix, relocated to parsed_repr. Add a TODO pointing at the coordinated .parsed redesign (PR generative-computing#1282) as the proper fix, out of scope here. PR body updated to match (was claiming .value narrows to MyModel). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Fixes #1273.
Why
When
format=is passed toact(), the model returns JSON and.valueholds the raw string — not a Pydantic instance. The natural workaround iscast(MyModel, result.value), which satisfies the type checker but raisesAttributeErrorat runtime. Because that error is often swallowed by a broadexcepthandler, callers fall back silently on every invocation rather than surfacing a clear failure.@generativeavoids this, but it requires a fixed function signature. Dynamically-built prompts in a loop — a common pattern for batch classification — cannot use it.Summary
ComputedModelOutputThunk.parsed— validates the raw JSON string viamodel_validate_jsonand returns the Pydantic instance typed asS | None. ReturnsNonewhen noformat=type was passed.parsedcan use it.valueandact()point callers at.parsed.Nonefallback, invalid JSON, value unchanged, copy/deepcopy_formatpreservation, and end-to-end through Ollama and HuggingFace backends.Before / After
Compatibility
.valueis unchanged — existing callers are unaffected..parsedis a new property; nothing relies on it today.Backendwithout settingmot._formatwill returnNonefrom.parsedeven whenformat=is passed. The.parseddocstring has aNote:calling this out explicitly for backend authors. The built-in backends are all updated.What is not in this PR
.parsedis now typedS | None, so aComputedModelOutputThunk[MyModel]exposes.parsedasMyModel | Nonewith no cast at the call site. The remaining gap is thatm.act(format=MyModel)does not yet inferComputedModelOutputThunk[MyModel]— the call site still resolves toComputedModelOutputThunk[Any]. That binding is tracked in #1274.Test plan
uv run pytest test/core/test_base.py -k parseduv run pytest test/core/ -m "not qualitative"uv run pytest test/backends/test_ollama.py -k parsed -m qualitative(requires Ollama)uv run pytest test/backends/test_huggingface.py -k parsed -m qualitative(requires GPU)