Skip to content

feat(thunk): add .parsed property to ComputedModelOutputThunk for structured output#1282

Closed
planetf1 wants to merge 8 commits into
generative-computing:mainfrom
planetf1:worktree-issue-1273
Closed

feat(thunk): add .parsed property to ComputedModelOutputThunk for structured output#1282
planetf1 wants to merge 8 commits into
generative-computing:mainfrom
planetf1:worktree-issue-1273

Conversation

@planetf1

@planetf1 planetf1 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Fixes #1273.

Why

When format= is passed to act(), the model returns JSON and .value holds the raw string — not a Pydantic instance. The natural workaround is cast(MyModel, result.value), which satisfies the type checker but raises AttributeError at runtime. Because that error is often swallowed by a broad except handler, callers fall back silently on every invocation rather than surfacing a clear failure.

@generative avoids this, but it requires a fixed function signature. Dynamically-built prompts in a loop — a common pattern for batch classification — cannot use it.

Summary

  • Adds ComputedModelOutputThunk.parsed — validates the raw JSON string via model_validate_json and returns the Pydantic instance typed as S | None. Returns None when no format= type was passed.
  • All five built-in backends store the format type on the thunk so parsed can use it.
  • Docstring notes on value and act() point callers at .parsed.
  • Unit tests covering the happy path, None fallback, invalid JSON, value unchanged, copy/deepcopy _format preservation, and end-to-end through Ollama and HuggingFace backends.

Before / After

# Before
result = m.act(Instruction("Say yes or no"), format=Result)
classification = cast(Result, result.value)
print(classification.label)  # AttributeError: 'str' object has no attribute 'label'

# After
result = m.act(Instruction("Say yes or no"), format=Result)
print(result.parsed.label)   # works

Compatibility

  • .value is unchanged — existing callers are unaffected.
  • .parsed is a new property; nothing relies on it today.
  • Custom backends that subclass Backend without setting mot._format will return None from .parsed even when format= is passed. The .parsed docstring has a Note: calling this out explicitly for backend authors. The built-in backends are all updated.

What is not in this PR

.parsed is now typed S | None, so a ComputedModelOutputThunk[MyModel] exposes .parsed as MyModel | None with no cast at the call site. The remaining gap is that m.act(format=MyModel) does not yet infer ComputedModelOutputThunk[MyModel] — the call site still resolves to ComputedModelOutputThunk[Any]. That binding is tracked in #1274.

Test plan

  • uv run pytest test/core/test_base.py -k parsed
  • uv run pytest test/core/ -m "not qualitative"
  • uv run pytest test/backends/test_ollama.py -k parsed -m qualitative (requires Ollama)
  • uv run pytest test/backends/test_huggingface.py -k parsed -m qualitative (requires GPU)

@github-actions github-actions Bot added the enhancement New feature or request label Jun 17, 2026
…uctured output

When `format=` is passed to `act()`/`instruct()`, the model returns a JSON string and
`.value` has always held that raw JSON — not a Pydantic instance.  Accessing `.label`
(etc.) on `.value` silently raises `AttributeError` at runtime while pyright accepts
the cast without complaint, leading to hard-to-debug silent failures.

This commit adds:
- `_format: type[pydantic.BaseModel] | None` attribute on `ModelOutputThunk` (initialised
  to `None`; propagated via `_copy_from`)
- All five backends (`ollama`, `litellm`, `openai`, `huggingface`, `watsonx`) now set
  `mot._format = _format` in `post_processing()`, alongside the existing
  `generate_log.extra` artefact
- `ComputedModelOutputThunk.parsed` property — calls `_format.model_validate_json(value)`
  when a format type is stored, returns `None` otherwise
- Docstring updates on `ModelOutputThunk.value` and `Session.act()` pointing callers to
  `.parsed` when `format=` is used
- Four unit tests covering: happy path, no-format returns None, invalid JSON raises
  `pydantic.ValidationError`, and `.value` is unaffected

Closes generative-computing#1273.

Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Assisted-by: Claude Code
@planetf1 planetf1 force-pushed the worktree-issue-1273 branch from b1f9251 to acb02f9 Compare June 17, 2026 09:07
planetf1 added 2 commits June 17, 2026 12:15
…es: to parsed

- Add `_format = self._format` to `__copy__` and `__deepcopy__` so that
  copying a ComputedModelOutputThunk preserves the format type; previously
  a copied thunk would silently return None from .parsed even when the
  original had a format set.
- Add `Raises: pydantic.ValidationError` to the `parsed` property docstring
  to document the exception callers must handle when the model returns
  malformed structured output.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
"no manual model_validate_json needed" is more accurate than
"MyModel instance, no cast needed" — .parsed returns BaseModel | None,
so static type narrowing still requires a cast; the value is just
already deserialized.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1 planetf1 marked this pull request as ready for review June 17, 2026 11:23
@planetf1 planetf1 requested review from a team, jakelorocco and nrfulton as code owners June 17, 2026 11:23
@planetf1 planetf1 requested a review from ajbozarth June 17, 2026 11:23
@jakelorocco

Copy link
Copy Markdown
Contributor

I think we should reconcile this change with the existing .parsed_repr field. That field is technically supposed to support the structured form of the data. However, the action used to generate the mot currently determines the shape of that data (which is reflected in the type system). We need to rectify these competing typing options.

It seems to me that this change saves the user the hassle of validating the model but doesn't actually give them typing information since the parsed field returns a generic pydantic.BaseModel type. One solution is to propagate the pydantic type through the function signature to the returned model output thunk (which we do for action types right now), assuming we make the disparate typing options coherent.

@planetf1

Copy link
Copy Markdown
Contributor Author

@nrfulton @jakelorocco @ajbozarth — this PR has been open for 2 days with no reviews yet. Happy to answer questions.

@ajbozarth ajbozarth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some feedback from Claude — follow-ups inline.

Comment thread mellea/backends/huggingface.py
Comment thread mellea/core/base.py Outdated
Comment thread mellea/core/base.py
Comment thread mellea/core/base.py
Comment thread test/core/test_base.py
@planetf1

Copy link
Copy Markdown
Contributor Author

parsed_repr and .parsed aren't actually overlapping yet — Instruction._parse is a string pass-through, so parsed_repr holds the raw JSON string even when format= is set. The two fields are independent with nothing connecting them. The real question — whether format= should override S so parsed_repr carries the validated model, at which point .parsed could just delegate to it — belongs in #1274. I've added that context there. Is there anything specific you'd want this PR to get right before that work happens?

planetf1 added 2 commits June 22, 2026 11:45
The HuggingFace chat-path post_processing never assigned mot._format,
so .parsed always returned None when format= was set via LocalHFBackend.
All other backends (ollama, openai, litellm, watsonx) already set it.

Also adds:
- Copy/deepcopy unit tests verifying _format is preserved across copies
- E2e tests in test_ollama and test_huggingface asserting .parsed returns
  a typed Pydantic instance end-to-end through each backend
- Docstring note on ComputedModelOutputThunk.parsed warning custom-backend
  authors to set mot._format in their post_processing method

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1 planetf1 requested a review from ajbozarth June 22, 2026 10:51
@planetf1 planetf1 enabled auto-merge June 22, 2026 11:42
planetf1 added a commit to planetf1/mellea that referenced this pull request Jun 22, 2026
The format= overloads narrow the thunk's generic element type, observable on
parsed_repr (S | None), not on .value — ComputedModelOutputThunk.value is
typed `-> str` unconditionally. parsed_repr also currently routes through
Instruction._parse (returns str), so parsed_repr.some_field type-checks but
AttributeErrors at runtime: the same silent-failure shape generative-computing#1274 set out to
fix, relocated to parsed_repr. Add a TODO pointing at the coordinated .parsed
redesign (PR generative-computing#1282) as the proper fix, out of scope here. PR body updated to
match (was claiming .value narrows to MyModel).

Assisted-by: Claude Code
planetf1 added a commit to planetf1/mellea that referenced this pull request Jun 22, 2026
…mat type

`.parsed` previously returned `pydantic.BaseModel | None`, so callers still
needed `cast(MyModel, result.parsed)` for static narrowing — the gap
@ajbozarth flagged on PR generative-computing#1282.

Thread the format type through the thunk's existing type parameter `S`:
`_format` is now `type[S] | None` and `.parsed` returns `S | None`. Reusing
`S` (rather than a second TypeVar) composes with the companion `format=`
overloads on generative-computing#1274, which bind `S` to the supplied model so
`m.act(action, format=MyModel)` yields `ComputedModelOutputThunk[MyModel]`
and `.parsed` is typed `MyModel | None`.

The `.parsed` body narrows `_format` to a pydantic type to call
`model_validate_json`, then re-asserts the result as `S` — `S` is unbounded
(it is `str` for plain instructions) so neither cast can be elided.

Add `test/typing/check_parsed.py` asserting `.parsed` tracks the type
parameter for both a model-parameterized and a `str` thunk.

Assisted-by: Claude Code
planetf1 added a commit to planetf1/mellea that referenced this pull request Jun 22, 2026
…mat type

`.parsed` previously returned `pydantic.BaseModel | None`, so callers still
needed `cast(MyModel, result.parsed)` for static narrowing — the gap
@ajbozarth flagged on PR generative-computing#1282.

Thread the format type through the thunk's existing type parameter `S`:
`_format` is now `type[S] | None` and `.parsed` returns `S | None`. Reusing
`S` (rather than a second TypeVar) composes with the companion `format=`
overloads on generative-computing#1274, which bind `S` to the supplied model so
`m.act(action, format=MyModel)` yields `ComputedModelOutputThunk[MyModel]`
and `.parsed` is typed `MyModel | None`.

The `.parsed` body narrows `_format` to a pydantic type to call
`model_validate_json`, then re-asserts the result as `S` — `S` is unbounded
(it is `str` for plain instructions) so neither cast can be elided.

Add `test/typing/check_parsed.py` asserting `.parsed` tracks the type
parameter for both a model-parameterized and a `str` thunk.

Assisted-by: Claude Code
planetf1 added 2 commits June 22, 2026 15:15
…mat type

`.parsed` previously returned `pydantic.BaseModel | None`, so callers still
needed `cast(MyModel, result.parsed)` for static narrowing — the gap
@ajbozarth flagged on PR generative-computing#1282.

Thread the format type through the thunk's existing type parameter `S`:
`_format` is now `type[S] | None` and `.parsed` returns `S | None`. Reusing
`S` (rather than a second TypeVar) composes with the companion `format=`
overloads on generative-computing#1274, which bind `S` to the supplied model so
`m.act(action, format=MyModel)` yields `ComputedModelOutputThunk[MyModel]`
and `.parsed` is typed `MyModel | None`.

The `.parsed` body narrows `_format` to a pydantic type to call
`model_validate_json`, then re-asserts the result as `S` — `S` is unbounded
(it is `str` for plain instructions) so neither cast can be elided.

Add `test/typing/check_parsed.py` asserting `.parsed` tracks the type
parameter for both a model-parameterized and a `str` thunk.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
… compat

Two pyright-compatibility fixes for the `.parsed` property added in bb53ddb:

1. `_format` annotation: revert from `type[S] | None` to
   `type[pydantic.BaseModel] | None`. Using a covariant TypeVar (`S`) in the
   invariant `type[...]` position is semantically unsound and can confuse
   stricter pyright configurations. The field only ever holds pydantic model
   types at runtime; the concrete annotation is accurate and avoids the
   variance issue entirely.

2. `.parsed` body: replace the two-step string-quoted cast
   (`cast("type[pydantic.BaseModel]", …)` then `cast("S", …)`) with a single
   direct cast (`cast(S, self._format.model_validate_json(self.value))`).
   Pyright resolves TypeVar forward-references in cast strings differently
   across versions; using the TypeVar directly is unambiguous.

3. `check_parsed.py`: use `cast(X, cast(object, None))` instead of
   `cast(X, None)` to avoid basedpyright's `reportInvalidCast` diagnostic
   (None and X share no overlap); assign `assert_type(…)` results to `_` to
   silence `reportUnusedCallResult`.

All three checkers (mypy, pyright 1.1.408+, basedpyright) now report clean
on both changed files.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1 planetf1 force-pushed the worktree-issue-1273 branch from 5e969cf to 6118cea Compare June 22, 2026 14:16
planetf1 added a commit to planetf1/mellea that referenced this pull request Jun 22, 2026
…mat type

`.parsed` previously returned `pydantic.BaseModel | None`, so callers still
needed `cast(MyModel, result.parsed)` for static narrowing — the gap
@ajbozarth flagged on PR generative-computing#1282.

Thread the format type through the thunk's existing type parameter `S`:
`_format` is now `type[S] | None` and `.parsed` returns `S | None`. Reusing
`S` (rather than a second TypeVar) composes with the companion `format=`
overloads on generative-computing#1274, which bind `S` to the supplied model so
`m.act(action, format=MyModel)` yields `ComputedModelOutputThunk[MyModel]`
and `.parsed` is typed `MyModel | None`.

The `.parsed` body narrows `_format` to a pydantic type to call
`model_validate_json`, then re-asserts the result as `S` — `S` is unbounded
(it is `str` for plain instructions) so neither cast can be elided.

Add `test/typing/check_parsed.py` asserting `.parsed` tracks the type
parameter for both a model-parameterized and a `str` thunk.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…rings

- ComputedModelOutputThunk.value now carries the same raw-JSON guidance as
  the parent override so callers inspecting the subclass see it directly.
- .parsed opening paragraph no longer overstates current type inference: the
  format= overloads do not yet bind S to the format model, so the cast idiom
  is required; removed the false claim that m.act(format=MyModel) yields a
  typed thunk without a cast.
- Added one-line distinction from parsed_repr to prevent confusion between
  the two properties.

Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
planetf1 added a commit to planetf1/mellea that referenced this pull request Jun 22, 2026
…mat type

`.parsed` previously returned `pydantic.BaseModel | None`, so callers still
needed `cast(MyModel, result.parsed)` for static narrowing — the gap
@ajbozarth flagged on PR generative-computing#1282.

Thread the format type through the thunk's existing type parameter `S`:
`_format` is now `type[S] | None` and `.parsed` returns `S | None`. Reusing
`S` (rather than a second TypeVar) composes with the companion `format=`
overloads on generative-computing#1274, which bind `S` to the supplied model so
`m.act(action, format=MyModel)` yields `ComputedModelOutputThunk[MyModel]`
and `.parsed` is typed `MyModel | None`.

The `.parsed` body narrows `_format` to a pydantic type to call
`model_validate_json`, then re-asserts the result as `S` — `S` is unbounded
(it is `str` for plain instructions) so neither cast can be elided.

Add `test/typing/check_parsed.py` asserting `.parsed` tracks the type
parameter for both a model-parameterized and a `str` thunk.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
planetf1 added a commit to planetf1/mellea that referenced this pull request Jun 22, 2026
The format= overloads narrow the thunk's generic element type, observable on
parsed_repr (S | None), not on .value — ComputedModelOutputThunk.value is
typed `-> str` unconditionally. parsed_repr also currently routes through
Instruction._parse (returns str), so parsed_repr.some_field type-checks but
AttributeErrors at runtime: the same silent-failure shape generative-computing#1274 set out to
fix, relocated to parsed_repr. Add a TODO pointing at the coordinated .parsed
redesign (PR generative-computing#1282) as the proper fix, out of scope here. PR body updated to
match (was claiming .value narrows to MyModel).

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1

Copy link
Copy Markdown
Contributor Author

Closing in favour of #1284, which combines this PR with the companion overload-binding work from #1274.

During review @jakelorocco correctly noted that without the format= overloads, .parsed is typed str | None at every call site — meaning callers still need a manual cast, which is exactly the ergonomic problem this feature was meant to solve. #1284 delivers both halves together so the feature is coherent end-to-end.

All commits from this branch are included in #1284 (rebased on top of upstream/main). The review history and resolved threads from this PR are referenced there. A backup of this branch is preserved at backup/issue-1273-pre-collapse on origin.

@planetf1 planetf1 closed this Jun 22, 2026
auto-merge was automatically disabled June 22, 2026 15:08

Pull request was closed

planetf1 added a commit to planetf1/mellea that referenced this pull request Jun 22, 2026
…re in one PR

- session.py overload comment now explicitly names all three attributes
  (.parsed = typed + runtime-correct via model_validate_json, .parsed_repr =
  statically typed but runtime gap via Instruction._parse → str, .value = str
  unconditionally) and directs callers to .parsed for structured output.
- check_session.py: add assert_type(r.parsed, _M | None) alongside the existing
  parsed_repr assertion; update the KNOWN LIMITATION comment now that .parsed is
  in this same PR rather than deferred to generative-computing#1282.
- check_parsed.py: update module docstring to reflect that the format= overloads
  are now in this PR, closing the "end-to-end" gap previously noted as pending.

Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
planetf1 added a commit to planetf1/mellea that referenced this pull request Jun 22, 2026
The format= overloads narrow the thunk's generic element type, observable on
parsed_repr (S | None), not on .value — ComputedModelOutputThunk.value is
typed `-> str` unconditionally. parsed_repr also currently routes through
Instruction._parse (returns str), so parsed_repr.some_field type-checks but
AttributeErrors at runtime: the same silent-failure shape generative-computing#1274 set out to
fix, relocated to parsed_repr. Add a TODO pointing at the coordinated .parsed
redesign (PR generative-computing#1282) as the proper fix, out of scope here. PR body updated to
match (was claiming .value narrows to MyModel).

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: structured output via act() silently returns JSON string, not Pydantic instance

3 participants