fix: chdir to charm root during event in Scenario#2550
Conversation
Under Juju, the current working directory at the start of a hook is the charm root; under Scenario it was wherever pytest ran from. Charms (and libraries) that read files relative to the cwd therefore break only under Scenario. Mirror the existing JUJU_CHARM_DIR handling: save the cwd, chdir to temporary_charm_root after updating the env, and restore in finally before the virtual charm root tempdir is cleaned up. Closes canonical#2045.
|
The obvious concern here is backwards compatibility, and validly so. I ran B → C diff: 10 charms flipped passed → failed, every one cwd-attributable. Of those, 7 raise Some options:
Correction on the Certum_Trusted_Root_CA.pem characterisation above: re-checking the source, neither charm has any rm ./Certum_Trusted_Root_CA.pem in charm or test code. Both shell out to subprocess.run(["update-ca-certificates", "--fresh"]) on the host (prometheus-k8s-operator/src/charm.py:624, grafana-agent-operator/src/charm.py:446). That command rewrites /etc/ssl/certs/ and prints Adding debian:Certum_Trusted_Root_CA.pem to stdout, which is where the path came from in the failure log. Since update-ca-certificates --fresh uses absolute paths, these two are likely not cwd-attributable and the real count is closer to 8. |
james-garner-canonical
left a comment
There was a problem hiding this comment.
The obvious concern here is backwards compatibility, and validly so.
I ran
hyrum unit --framework scenario --workers 2over the curated 233-charm cache. Three runs over the same 50-charm filter set: A--no-patch(88% pass), B swap tocanonical/operator:main(34/50 pass: the 6-charm degradation vs A is unrelated dependency-swap noise, same in both controls), C swap to the patched branch (24/50 pass).
Why the 50 vs 233 charm count discrepancy here?
B → C diff: 10 charms flipped passed → failed, every one cwd-attributable. Of those, 7 raise
FileNotFoundErroron a path that's relative-to-cwd (charm code readstemplates/X.j2/src/Y.json/src/prometheus_alert_rules/...directly instead of fromself.framework.charm_dir); 2 surface as JinjaTemplateNotFound(vault-k8s-operator/{k8s,machine}, the same root cause via the template loader's search path); and 2 (prometheus-k8s-operator,grafana-agent-operator) shell out torm ./Certum_Trusted_Root_CA.pemfor test cleanup. Pre-patch that hits the test's own working copy; post-patch the relative path resolves into the system CA store and bounces off permissions, which is the patch incidentally surfacing a test-side safety bug (test would have deleted a system file if it ran with elevated perms).Some options:
- We accept that, although wrong, this is the behaviour of Scenario.
- As above, but add a DeprecationWarning and change it in a future major version bump.
- Add a flag (environment variable maybe, like the others we have) to switch the behaviour, plan to swap the defaults in a future major version bump.
- Accept that this is a fix and break tests. We can open PRs against the known 10 charms in advance.
Modulo improving my understanding of our current behaviour with the temporary charm root, I lean towards the deprecation warning + change -- our concurrent push to have charms treat warnings as errors should hopefully move people in the direction we want. I'm open to the environment variable gating with default swapping in the future release as well.
That said, contingent on us actually finding all the users this would break, and again modulo my understanding of our current behaviour, I'm not opposed to the fix approach.
| previous_cwd = os.getcwd() | ||
| os.chdir(temporary_charm_root) |
There was a problem hiding this comment.
I'm a little confused about the current behaviour -- if we didn't previously expose the temporary_charm_root to the charm under test by making it the current working directory, how were its contents actually exposed to and (potentially) used by the charm during testing?
There was a problem hiding this comment.
The virtual charm root is exposed via the JUJU_CHARM_DIR env var, which ops reads into framework.charm_dir. Charms that use self.framework.charm_dir / "templates/foo.j2 were fine.
The problem is code that uses relative paths, assuming the current directory, like open("templates/foo.j2"). Those rely on the real location you get in Juju being the charm root. Under Scenario (main), that's is wherever pytest ran, so those paths resolve into the test directory.
When no charm_root is passed, the tempdir contents are just the metadata/config/actions YAMLs that ops loads through charm_dir anyway (although there's a whole conversation there about maybe it should be whatever would get packed). When a custom charm_root is passed, its files were reachable through charm_dir but not via relative paths.
The scenario filter. There are lots of charms that are not ops, and lots that don't use Scenario so wouldn't be impacted. |
|
The fix looks good to me.
😨. I am curious how it happened. I couldn't find an example searching through their source code. Overall, I am also okay with these two options:
I lean toward the first one more. To me, the DeprecationWarning can communicate to authors of 10 charms. We can add the flag if they request keeping the behavior in #2045. |
You're right to be suspicious. I went back to the source and Certum_Trusted_Root_CA.pem doesn't appear anywhere in those two charms' code. What actually happens is that prometheus-k8s-operator (src/charm.py:624) and grafana-agent-operator (src/charm.py:446) call I conflated stdout with file access in my original write-up; apologies for the noise. Since |
…ario Under Juju the working directory at hook start is the charm root; under Scenario it is wherever pytest ran. Unconditionally chdir'ing to the charm root, as the previous commit on this branch did, broke roughly eight charms in the curated cache. Switch to an audit-hook based deprecation path: by default leave the working directory alone but install a `sys.addaudithook` (scoped via `threading.local` to event dispatch) that emits a `DeprecationWarning` the first time charm code opens, lists, or scans a relative path that would resolve differently between the test cwd and the charm root. Set `SCENARIO_CHDIR_TO_CHARM_ROOT=1` to opt in to the Juju-matching behaviour now, ahead of the default flipping in a future major release. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
I've changed this to go the deprecation warning and env var direction. Not simple to warn at the right time - I've used an audit hook (added in 3.8, so fine for our 3.10+), which is a little wild, and my first use of these for real, but doesn't seem outside the bounds of what they are meant for. |
I'm particularly concerned about |
| if path is None or isinstance(path, int): | ||
| return |
There was a problem hiding this comment.
I'm curious in which case can path be an int.
I am thinking of the argument buffering to open, is that correct?
| state = getattr(_CWD_AUDIT_STATE, 'current', None) | ||
| if state is None: | ||
| return | ||
| if event not in ('open', 'os.listdir', 'os.scandir'): |
There was a problem hiding this comment.
I'm curious about the criteria for selecting this set of events. I assume it came from the 8 charms which depends on Scenario's 'unexpected' dir behavior ? If that's true, it's all good for me, as I don't think we should include more events.
| return | ||
| if path_str in state['warned']: | ||
| return | ||
| state['reentrant'] = True |
There was a problem hiding this comment.
Just to confirm, reentrant is used to prevent infinite recursion from this function because we do file operations included in the warning event ('open', 'os.listdir', 'os.scandir')?
If that is true, may I suggest having a small comment describing the purpose here?
Under Juju, the current working directory at the start of a hook is the charm root; under Scenario it is wherever pytest ran from. Charms (and libraries) that read files relative to the cwd therefore break only under Scenario.
This PR mirrors the existing
JUJU_CHARM_DIRhandling: save the cwd, chdir totemporary_charm_rootafter updating the env, and restore in finally before the virtual charm root tempdir is cleaned up. However, to avoid breaking existing charms, it only does this if an environment variable is set, and otherwise warns for incorrect behaviour. Ideally, in the next major version we would flip the default for the env variable.Fixes #2045.