refactor: distinguish between init and attribute types in testing state classes#2331
refactor: distinguish between init and attribute types in testing state classes#2331tonyandrewmeyer wants to merge 15 commits into
Conversation
…ting and types attributes will be
james-garner-canonical
left a comment
There was a problem hiding this comment.
I'm a big fan of making this change, thanks for taking care of this. I have a number of suggestions around typing and defaults, which I've made on individual lines, though they typically apply to more lines across the PR -- but I figured I'd keep the comments fewer than they'd otherwise be ...
Do you think this change warrants some additional unit tests, or are you happy that the existing tests would catch any errors in this PR? If the latter, please mention the relevant test suites.
In terms of not breaking the In terms of tests for the changes, I'm not super keen on having tests like: c = CloudCredential(auth_type="foo", redacted=['a', 'b', 'c'])
assert isinstance(c.redacted, list)I know we have some tests where we expect pyright to find issues, but I'm not sure it's the right move to add something like that for this either. Do you have any suggestions in terms of tests? |
Nice!
I wouldn't mind seeing tests a bit like the one you're not keen on, explicitly encoding (from a user perspective) the type conversion and copying behaviour that we're implementing. redacted = ['a', 'b', 'c']
...
c = CloudCredential(auth_type="foo", redacted=redacted, ...)
assert isinstance(c.redacted, list) # or tuple if we go that way
assert c.redacted == redacted
assert c.redacted is not redacted
...The existing tests probably do cover a lot of this, but a lot of them are hard to follow at a glance due to the parametrization and abstraction. |
dimaqq
left a comment
There was a problem hiding this comment.
I kinda like this.
If it works, it's fair to merge :)
Happy to leave the details to James.
|
@james-garner-canonical brought up the excellent point that these are frozen dataclasses that we want people to treat as immutable. So giving their type checker information that they have a list (rather than an immutable Sequence) or a dict (rather than an immutable Mapping) leads them to where we don't want to go. So rejecting this instead. |
…ave type checkers alert to mutating the state.
|
@james-garner-canonical I've adjusted per the discussion we had earlier in the week, and this should be good for reviewing again now, thanks! |
james-garner-canonical
left a comment
There was a problem hiding this comment.
I really like the direction here, and I definitely think it's worth making these changes to decouple the __init__ argument typing from the attribute typing.
I've flagged a number of items that I think require some further thought before merging.
| object.__setattr__(self, 'content', dict(content)) | ||
| object.__setattr__(self, '_data_type_name', _data_type_name) | ||
| _deepcopy_mutable_fields(self) |
There was a problem hiding this comment.
| object.__setattr__(self, 'content', dict(content)) | |
| object.__setattr__(self, '_data_type_name', _data_type_name) | |
| _deepcopy_mutable_fields(self) | |
| object.__setattr__(self, 'content', copy.deepcopy(content)) | |
| object.__setattr__(self, '_data_type_name', _data_type_name) |
| relations: Iterable[RelationBase] = dataclasses.field(default_factory=frozenset) | ||
| relations: frozenset[RelationBase] |
There was a problem hiding this comment.
Should we use Collection instead for all these attributes?
Co-authored-by: James Garner <james.garner@canonical.com>
|
I'll wait for Tony and James to hash it out. |
dimaqq
left a comment
There was a problem hiding this comment.
WDYT about settings up a working group (maybe James and Tony only) to figure out the minutae?
|
|
||
| AnyJson = str | bool | dict[str, 'AnyJson'] | int | float | list['AnyJson'] | ||
| RawSecretRevisionContents = RawDataBagContents = dict[str, str] | ||
| RawSecretRevisionContents = RawDataBagContents = Mapping[str, str] |
There was a problem hiding this comment.
This one I'm not convinced about, because it's an "in-out" field:
# user passes data in, Mapping is great here
rel = Relation(..., remote_app_data=this_is_now_a_mapping)
state = State(relations={rel})# user inspects of modified data, dict is better here
state = ctx.run(..., state)
rel = state.get_relation(rel)
# what we expect charmers to do, Mapping is helpful
assert rel.remote_units_data[unit]["foo"] == '"bar"'
# what charmers may also do, I think thaththis PR is a breaking change wrt. typing
rel.remote_units_data[unit].setdefault("foo", '"bar"')The intention of having this field typed as a Mapping is good, IMO.
The practical implications though, I'm not sure about.
james-garner-canonical
left a comment
There was a problem hiding this comment.
Looks good, apologies for the delays with this one. Thanks for all the work and super-tox-ing.
I've commented in one place about Iterable/Sequence[str] typing permitting bare str arguments. This is largely a pre-existing issue, except that inlining _deepcopy_mutable_fields leads to a behaviour change where a bare str now gets list called on it. Either way it would have been a technical error that should always have resulted in a runtime error at some point (though would it always have?). But if it's not a runtime error then we are introducing a behaviour change that could show up in state.
I'd be on board with adding an explicit check and error for bare str if that doesn't break existing tests.
| identity_endpoint: str | None = None, | ||
| storage_endpoint: str | None = None, | ||
| credential: CloudCredential | None = None, | ||
| ca_certificates: Iterable[str] = (), |
There was a problem hiding this comment.
This applied before the PR when the argument was typed as Sequence[str] -- ca_credentials='some string' is legal at type checking time.
Previously, _deepcopy_mutable_variables would have left us with ca_certificates = 'some string'. Now we'd get ca_certificates = ['s', 'o', 'm', ...].
I assume that both of these are bad and may result in an error at some later point, but maybe they can silently pass through a test in some cases? Presumably this would ideally be an error straight away. Alternatively, a legal shorthand for ca_certificates = ['some string'].
There was a problem hiding this comment.
Good catch. Added a _list_of_str helper that raises StateValidationError when a bare str is passed where an iterable of strings is expected, and applied it to redacted, ca_certificates, ingress_addresses, egress_subnets, and the values in Secret.remote_grants. Existing tests still pass and there's a new test_bare_str_rejected parametrised test covering each site.
Add a _list_of_str helper that raises StateValidationError when a bare str is passed where an iterable of strings is expected, and apply it to redacted, ca_certificates, ingress_addresses, egress_subnets, and the values in Secret.remote_grants.
- _list_of_str now also rejects bytes (which iterate as ints). - State.config is now copied so the caller's dict isn't aliased. - Comment Secret._tracked_revision/_latest_revision: class attrs, not fields, deliberately excluded from __eq__. - Comment RelationBase: still uses dataclass-generated __init__, asymmetry with the rest of the module is deliberate. - Comment Container's mixed deepcopy/shallow-copy choices. - Comment why Exec._change_id, Container._base_plan, and StoredState._data_type_name remain in __init__ despite being private.
|
This has two ticks, but one is Dima's so it doesn't pass any more. @tromai would you mind having a look? |
|
New concern that occurred to me today: adding a custom I guess we should (re-)check with Hyrum? I suppose checking for new failures in unit tests would be sufficient, but it also feels like something it should be possible to analyse statically ... perhaps a feature for another time. |
|
We can definitely re-check with Hyrum. However, I guess we have to step back and decide what we're promising when exposing dataclasses as part of the public API (and document that). It seems to me people shouldn't need to subclass testing state classes, and maybe we should rule that out. |
There was a problem hiding this comment.
Thank you for waiting. The current changes look good to me.
It took me sometime to understand that dataclasses attribute type definition can be different from the types in __init__ signature. These classes need additional logic in __init__ to "pre-process" the parameters.
The changes make sense to me after I wrap my head around it.
When designing the Scenario 7 API we introduced kw-only args, and originally had custom
__init__for each state class to support that. We decided to change that because it felt busy and a lot of maintenance.However, we currently have an unfortunate mismatch between some of the types accepted to create an instance of a state class and the type the corresponding attribute will be. For example, the init might accept any
Mappingbut we know the attribute will always be adict. It would be nice to provide that information to users.Now that we are using Python 3.10+, we do have some classes without this issue that can continue to use the dataclasses generated
__init__. However, there are many that would be better as more explicit, and I am not convinced it's too much work to maintain.We opened the door to this in #2274 adjusting
CheckInfo. This PR applies the same improvement to the rest of the state classes.A couple of small behaviour changes fall out of this:
*_addresses,*_subnets,redacted,ca_certificates, andSecret.remote_grantsarguments now reject a barestr(orbytes) up front. Previously this would have silently iterated the string character-by-character, leaving you with['s', 'o', 'm', ...]in the attribute.app_status=Noneorunit_status=NonetoStatenow coerces toUnknownStatus(), where previously it would have raisedTypeError. ExplicitNoneis now treated the same as omitting the argument.Fixes #2152