Skip to content

refactor: distinguish between init and attribute types in testing state classes#2331

Open
tonyandrewmeyer wants to merge 15 commits into
canonical:mainfrom
tonyandrewmeyer:scenario-state-init
Open

refactor: distinguish between init and attribute types in testing state classes#2331
tonyandrewmeyer wants to merge 15 commits into
canonical:mainfrom
tonyandrewmeyer:scenario-state-init

Conversation

@tonyandrewmeyer

@tonyandrewmeyer tonyandrewmeyer commented Feb 17, 2026

Copy link
Copy Markdown
Collaborator

When designing the Scenario 7 API we introduced kw-only args, and originally had custom __init__ for each state class to support that. We decided to change that because it felt busy and a lot of maintenance.

However, we currently have an unfortunate mismatch between some of the types accepted to create an instance of a state class and the type the corresponding attribute will be. For example, the init might accept any Mapping but we know the attribute will always be a dict. It would be nice to provide that information to users.

Now that we are using Python 3.10+, we do have some classes without this issue that can continue to use the dataclasses generated __init__. However, there are many that would be better as more explicit, and I am not convinced it's too much work to maintain.

We opened the door to this in #2274 adjusting CheckInfo. This PR applies the same improvement to the rest of the state classes.

A couple of small behaviour changes fall out of this:

  • The *_addresses, *_subnets, redacted, ca_certificates, and Secret.remote_grants arguments now reject a bare str (or bytes) up front. Previously this would have silently iterated the string character-by-character, leaving you with ['s', 'o', 'm', ...] in the attribute.
  • Passing app_status=None or unit_status=None to State now coerces to UnknownStatus(), where previously it would have raised TypeError. Explicit None is now treated the same as omitting the argument.

Fixes #2152

@james-garner-canonical james-garner-canonical left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of making this change, thanks for taking care of this. I have a number of suggestions around typing and defaults, which I've made on individual lines, though they typically apply to more lines across the PR -- but I figured I'd keep the comments fewer than they'd otherwise be ...

Do you think this change warrants some additional unit tests, or are you happy that the existing tests would catch any errors in this PR? If the latter, please mention the relevant test suites.

Comment thread testing/src/scenario/state.py Outdated
Comment thread testing/src/scenario/state.py Outdated
Comment thread testing/src/scenario/state.py Outdated
Comment thread testing/src/scenario/state.py Outdated
Comment thread testing/src/scenario/state.py Outdated
Comment thread testing/src/scenario/state.py Outdated
Comment thread testing/src/scenario/state.py
Comment thread testing/src/scenario/state.py Outdated
Comment thread testing/src/scenario/state.py Outdated
Comment thread testing/tests/test_e2e/test_state.py
@tonyandrewmeyer

Copy link
Copy Markdown
Collaborator Author

Do you think this change warrants some additional unit tests, or are you happy that the existing tests would catch any errors in this PR? If the latter, please mention the relevant test suites.

In terms of not breaking the __init__ signature, I feel like the tests from here downwards should cover that (right number of positional/keyword-only arguments in particular), and they should also cover the behaviour of __init__ in terms of forcing immutability (same file, the tests following on from the previous ones). So I feel comfortable that any regressions in this PR would be caught.

In terms of tests for the changes, I'm not super keen on having tests like:

c = CloudCredential(auth_type="foo", redacted=['a', 'b', 'c'])
assert isinstance(c.redacted, list)

I know we have some tests where we expect pyright to find issues, but I'm not sure it's the right move to add something like that for this either.

Do you have any suggestions in terms of tests?

@james-garner-canonical

Copy link
Copy Markdown
Contributor

Do you think this change warrants some additional unit tests, or are you happy that the existing tests would catch any errors in this PR? If the latter, please mention the relevant test suites.

In terms of not breaking the __init__ signature, I feel like the tests from here downwards should cover that (right number of positional/keyword-only arguments in particular), and they should also cover the behaviour of __init__ in terms of forcing immutability (same file, the tests following on from the previous ones). So I feel comfortable that any regressions in this PR would be caught.

Nice!

In terms of tests for the changes, I'm not super keen on having tests like:

c = CloudCredential(auth_type="foo", redacted=['a', 'b', 'c'])
assert isinstance(c.redacted, list)

I know we have some tests where we expect pyright to find issues, but I'm not sure it's the right move to add something like that for this either.

Do you have any suggestions in terms of tests?

I wouldn't mind seeing tests a bit like the one you're not keen on, explicitly encoding (from a user perspective) the type conversion and copying behaviour that we're implementing.

redacted = ['a', 'b', 'c']
...
c = CloudCredential(auth_type="foo", redacted=redacted, ...)
assert isinstance(c.redacted, list)  # or tuple if we go that way
assert c.redacted == redacted
assert c.redacted is not redacted
...

The existing tests probably do cover a lot of this, but a lot of them are hard to follow at a glance due to the parametrization and abstraction.

@dimaqq dimaqq left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda like this.
If it works, it's fair to merge :)

Happy to leave the details to James.

Comment thread testing/src/scenario/state.py Outdated
Comment thread testing/src/scenario/state.py Outdated
@tonyandrewmeyer

Copy link
Copy Markdown
Collaborator Author

@james-garner-canonical brought up the excellent point that these are frozen dataclasses that we want people to treat as immutable. So giving their type checker information that they have a list (rather than an immutable Sequence) or a dict (rather than an immutable Mapping) leads them to where we don't want to go.

So rejecting this instead.

@tonyandrewmeyer tonyandrewmeyer deleted the scenario-state-init branch February 19, 2026 23:26
@tonyandrewmeyer tonyandrewmeyer restored the scenario-state-init branch March 21, 2026 08:20
@tonyandrewmeyer tonyandrewmeyer marked this pull request as draft March 21, 2026 08:20
@tonyandrewmeyer tonyandrewmeyer marked this pull request as ready for review March 26, 2026 21:48
@tonyandrewmeyer

Copy link
Copy Markdown
Collaborator Author

@james-garner-canonical I've adjusted per the discussion we had earlier in the week, and this should be good for reviewing again now, thanks!

@james-garner-canonical james-garner-canonical left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the direction here, and I definitely think it's worth making these changes to decouple the __init__ argument typing from the attribute typing.

I've flagged a number of items that I think require some further thought before merging.

Comment thread testing/src/scenario/state.py Outdated
Comment thread testing/src/scenario/state.py Outdated
Comment thread testing/src/scenario/state.py Outdated
Comment thread testing/src/scenario/state.py Outdated
Comment thread testing/src/scenario/state.py Outdated
Comment thread testing/src/scenario/state.py
Comment thread testing/src/scenario/state.py
Comment thread testing/src/scenario/state.py Outdated
Comment on lines +1519 to +1521
object.__setattr__(self, 'content', dict(content))
object.__setattr__(self, '_data_type_name', _data_type_name)
_deepcopy_mutable_fields(self)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
object.__setattr__(self, 'content', dict(content))
object.__setattr__(self, '_data_type_name', _data_type_name)
_deepcopy_mutable_fields(self)
object.__setattr__(self, 'content', copy.deepcopy(content))
object.__setattr__(self, '_data_type_name', _data_type_name)

Comment on lines -1575 to +1698
relations: Iterable[RelationBase] = dataclasses.field(default_factory=frozenset)
relations: frozenset[RelationBase]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use Collection instead for all these attributes?

Comment thread testing/tests/test_e2e/test_state.py
Comment thread testing/src/scenario/state.py Outdated
@dimaqq

dimaqq commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

I'll wait for Tony and James to hash it out.

Comment thread testing/src/scenario/state.py

@dimaqq dimaqq left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about settings up a working group (maybe James and Tony only) to figure out the minutae?


AnyJson = str | bool | dict[str, 'AnyJson'] | int | float | list['AnyJson']
RawSecretRevisionContents = RawDataBagContents = dict[str, str]
RawSecretRevisionContents = RawDataBagContents = Mapping[str, str]

@dimaqq dimaqq Apr 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I'm not convinced about, because it's an "in-out" field:

# user passes data in, Mapping is great here
rel = Relation(..., remote_app_data=this_is_now_a_mapping)
state = State(relations={rel})
# user inspects of modified data, dict is better here
state = ctx.run(..., state)
rel = state.get_relation(rel)

# what we expect charmers to do, Mapping is helpful
assert rel.remote_units_data[unit]["foo"] == '"bar"'

# what charmers may also do, I think thaththis PR is a breaking change wrt. typing
rel.remote_units_data[unit].setdefault("foo", '"bar"')

The intention of having this field typed as a Mapping is good, IMO.
The practical implications though, I'm not sure about.

@james-garner-canonical james-garner-canonical left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, apologies for the delays with this one. Thanks for all the work and super-tox-ing.


I've commented in one place about Iterable/Sequence[str] typing permitting bare str arguments. This is largely a pre-existing issue, except that inlining _deepcopy_mutable_fields leads to a behaviour change where a bare str now gets list called on it. Either way it would have been a technical error that should always have resulted in a runtime error at some point (though would it always have?). But if it's not a runtime error then we are introducing a behaviour change that could show up in state.

I'd be on board with adding an explicit check and error for bare str if that doesn't break existing tests.

identity_endpoint: str | None = None,
storage_endpoint: str | None = None,
credential: CloudCredential | None = None,
ca_certificates: Iterable[str] = (),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applied before the PR when the argument was typed as Sequence[str] -- ca_credentials='some string' is legal at type checking time.

Previously, _deepcopy_mutable_variables would have left us with ca_certificates = 'some string'. Now we'd get ca_certificates = ['s', 'o', 'm', ...].

I assume that both of these are bad and may result in an error at some later point, but maybe they can silently pass through a test in some cases? Presumably this would ideally be an error straight away. Alternatively, a legal shorthand for ca_certificates = ['some string'].

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Added a _list_of_str helper that raises StateValidationError when a bare str is passed where an iterable of strings is expected, and applied it to redacted, ca_certificates, ingress_addresses, egress_subnets, and the values in Secret.remote_grants. Existing tests still pass and there's a new test_bare_str_rejected parametrised test covering each site.

Add a _list_of_str helper that raises StateValidationError when a bare
str is passed where an iterable of strings is expected, and apply it to
redacted, ca_certificates, ingress_addresses, egress_subnets, and the
values in Secret.remote_grants.
- _list_of_str now also rejects bytes (which iterate as ints).
- State.config is now copied so the caller's dict isn't aliased.
- Comment Secret._tracked_revision/_latest_revision: class attrs, not
  fields, deliberately excluded from __eq__.
- Comment RelationBase: still uses dataclass-generated __init__,
  asymmetry with the rest of the module is deliberate.
- Comment Container's mixed deepcopy/shallow-copy choices.
- Comment why Exec._change_id, Container._base_plan, and
  StoredState._data_type_name remain in __init__ despite being private.
@tonyandrewmeyer

Copy link
Copy Markdown
Collaborator Author

This has two ticks, but one is Dima's so it doesn't pass any more. @tromai would you mind having a look?

@tonyandrewmeyer tonyandrewmeyer requested a review from tromai June 9, 2026 07:51
@james-garner-canonical

Copy link
Copy Markdown
Contributor

New concern that occurred to me today: adding a custom __init__ (which doesn't call __post_init__) breaks anyone who subclassed one of these classes and defined their own __post_init__ (it would no longer be called).

I guess we should (re-)check with Hyrum? I suppose checking for new failures in unit tests would be sufficient, but it also feels like something it should be possible to analyse statically ... perhaps a feature for another time.

@benhoyt

benhoyt commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

We can definitely re-check with Hyrum. However, I guess we have to step back and decide what we're promising when exposing dataclasses as part of the public API (and document that). It seems to me people shouldn't need to subclass testing state classes, and maybe we should rule that out.

@tromai tromai left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for waiting. The current changes look good to me.

It took me sometime to understand that dataclasses attribute type definition can be different from the types in __init__ signature. These classes need additional logic in __init__ to "pre-process" the parameters.

The changes make sense to me after I wrap my head around it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ops[testing] State.config type, should it be dict or Mapping?

5 participants