Skip to content

fix: chdir to charm root during event in Scenario#2550

Open
tonyandrewmeyer wants to merge 4 commits into
canonical:mainfrom
tonyandrewmeyer:rainy/2045-cwd-charm-root-in-scenario
Open

fix: chdir to charm root during event in Scenario#2550
tonyandrewmeyer wants to merge 4 commits into
canonical:mainfrom
tonyandrewmeyer:rainy/2045-cwd-charm-root-in-scenario

Conversation

@tonyandrewmeyer

@tonyandrewmeyer tonyandrewmeyer commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

Under Juju, the current working directory at the start of a hook is the charm root; under Scenario it is wherever pytest ran from. Charms (and libraries) that read files relative to the cwd therefore break only under Scenario.

This PR mirrors the existing JUJU_CHARM_DIR handling: save the cwd, chdir to temporary_charm_root after updating the env, and restore in finally before the virtual charm root tempdir is cleaned up. However, to avoid breaking existing charms, it only does this if an environment variable is set, and otherwise warns for incorrect behaviour. Ideally, in the next major version we would flip the default for the env variable.

Fixes #2045.

Under Juju, the current working directory at the start of a hook is
the charm root; under Scenario it was wherever pytest ran from.
Charms (and libraries) that read files relative to the cwd therefore
break only under Scenario.

Mirror the existing JUJU_CHARM_DIR handling: save the cwd, chdir to
temporary_charm_root after updating the env, and restore in finally
before the virtual charm root tempdir is cleaned up.

Closes canonical#2045.
@tonyandrewmeyer

tonyandrewmeyer commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator Author

The obvious concern here is backwards compatibility, and validly so.

I ran hyrum unit --framework scenario --workers 2 over the curated 233-charm cache. Three runs over the same 50-charm filter set: A --no-patch (88% pass), B swap to canonical/operator:main (34/50 pass: the 6-charm degradation vs A is unrelated dependency-swap noise, same in both controls), C swap to the patched branch (24/50 pass).

B → C diff: 10 charms flipped passed → failed, every one cwd-attributable. Of those, 7 raise FileNotFoundError on a path that's relative-to-cwd (charm code reads templates/X.j2 / src/Y.json / src/prometheus_alert_rules/... directly instead of from self.framework.charm_dir); 2 surface as Jinja TemplateNotFound (vault-k8s-operator/{k8s,machine}, the same root cause via the template loader's search path); and 2 (prometheus-k8s-operator, grafana-agent-operator) shell out to rm ./Certum_Trusted_Root_CA.pem for test cleanup. Pre-patch that hits the test's own working copy; post-patch the relative path resolves into the system CA store and bounces off permissions, which is the patch incidentally surfacing a test-side safety bug (test would have deleted a system file if it ran with elevated perms).

Some options:

  • We accept that, although wrong, this is the behaviour of Scenario.
  • As above, but add a DeprecationWarning and change it in a future major version bump.
  • Add a flag (environment variable maybe, like the others we have) to switch the behaviour, plan to swap the defaults in a future major version bump.
  • Accept that this is a fix and break tests. We can open PRs against the known 10 charms in advance.

Correction on the Certum_Trusted_Root_CA.pem characterisation above: re-checking the source, neither charm has any rm ./Certum_Trusted_Root_CA.pem in charm or test code. Both shell out to subprocess.run(["update-ca-certificates", "--fresh"]) on the host (prometheus-k8s-operator/src/charm.py:624, grafana-agent-operator/src/charm.py:446). That command rewrites /etc/ssl/certs/ and prints Adding debian:Certum_Trusted_Root_CA.pem to stdout, which is where the path came from in the failure log. Since update-ca-certificates --fresh uses absolute paths, these two are likely not cwd-attributable and the real count is closer to 8.

@tonyandrewmeyer tonyandrewmeyer marked this pull request as ready for review June 7, 2026 23:13

@james-garner-canonical james-garner-canonical left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The obvious concern here is backwards compatibility, and validly so.

I ran hyrum unit --framework scenario --workers 2 over the curated 233-charm cache. Three runs over the same 50-charm filter set: A --no-patch (88% pass), B swap to canonical/operator:main (34/50 pass: the 6-charm degradation vs A is unrelated dependency-swap noise, same in both controls), C swap to the patched branch (24/50 pass).

Why the 50 vs 233 charm count discrepancy here?

B → C diff: 10 charms flipped passed → failed, every one cwd-attributable. Of those, 7 raise FileNotFoundError on a path that's relative-to-cwd (charm code reads templates/X.j2 / src/Y.json / src/prometheus_alert_rules/... directly instead of from self.framework.charm_dir); 2 surface as Jinja TemplateNotFound (vault-k8s-operator/{k8s,machine}, the same root cause via the template loader's search path); and 2 (prometheus-k8s-operator, grafana-agent-operator) shell out to rm ./Certum_Trusted_Root_CA.pem for test cleanup. Pre-patch that hits the test's own working copy; post-patch the relative path resolves into the system CA store and bounces off permissions, which is the patch incidentally surfacing a test-side safety bug (test would have deleted a system file if it ran with elevated perms).

Some options:

  • We accept that, although wrong, this is the behaviour of Scenario.
  • As above, but add a DeprecationWarning and change it in a future major version bump.
  • Add a flag (environment variable maybe, like the others we have) to switch the behaviour, plan to swap the defaults in a future major version bump.
  • Accept that this is a fix and break tests. We can open PRs against the known 10 charms in advance.

Modulo improving my understanding of our current behaviour with the temporary charm root, I lean towards the deprecation warning + change -- our concurrent push to have charms treat warnings as errors should hopefully move people in the direction we want. I'm open to the environment variable gating with default swapping in the future release as well.

That said, contingent on us actually finding all the users this would break, and again modulo my understanding of our current behaviour, I'm not opposed to the fix approach.

Comment thread testing/src/scenario/_runtime.py Outdated
Comment on lines +328 to +329
previous_cwd = os.getcwd()
os.chdir(temporary_charm_root)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about the current behaviour -- if we didn't previously expose the temporary_charm_root to the charm under test by making it the current working directory, how were its contents actually exposed to and (potentially) used by the charm during testing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The virtual charm root is exposed via the JUJU_CHARM_DIR env var, which ops reads into framework.charm_dir. Charms that use self.framework.charm_dir / "templates/foo.j2 were fine.

The problem is code that uses relative paths, assuming the current directory, like open("templates/foo.j2"). Those rely on the real location you get in Juju being the charm root. Under Scenario (main), that's is wherever pytest ran, so those paths resolve into the test directory.

When no charm_root is passed, the tempdir contents are just the metadata/config/actions YAMLs that ops loads through charm_dir anyway (although there's a whole conversation there about maybe it should be whatever would get packed). When a custom charm_root is passed, its files were reachable through charm_dir but not via relative paths.

@tonyandrewmeyer

Copy link
Copy Markdown
Collaborator Author

Why the 50 vs 233 charm count discrepancy here?

The scenario filter. There are lots of charms that are not ops, and lots that don't use Scenario so wouldn't be impacted.

@tromai

tromai commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

The fix looks good to me.

Pre-patch that hits the test's own working copy; post-patch the relative path resolves into the system CA store and bounces off permissions

😨. I am curious how it happened. I couldn't find an example searching through their source code.

Overall, I am also okay with these two options:

  1. As above, but add a DeprecationWarning and change it in a future major version bump.
  2. Add a flag (environment variable maybe, like the others we have) to switch the behaviour, plan to swap the defaults in a future major version bump.

I lean toward the first one more. To me, the DeprecationWarning can communicate to authors of 10 charms. We can add the flag if they request keeping the behavior in #2045.

@tonyandrewmeyer

Copy link
Copy Markdown
Collaborator Author

Pre-patch that hits the test's own working copy; post-patch the relative path resolves into the system CA store and bounces off permissions

😨. I am curious how it happened. I couldn't find an example searching through their source code.

You're right to be suspicious. I went back to the source and Certum_Trusted_Root_CA.pem doesn't appear anywhere in those two charms' code. What actually happens is that prometheus-k8s-operator (src/charm.py:624) and grafana-agent-operator (src/charm.py:446) call subprocess.run(["update-ca-certificates", "--fresh"]) on the host. That command rewrites /etc/ssl/certs/, printing lines like Adding debian:Certum_Trusted_Root_CA.pem to stdout, which is where the path came from in the failure log.

I conflated stdout with file access in my original write-up; apologies for the noise.

Since update-ca-certificates --fresh uses absolute paths and not cwd-relative ones, those two probably aren't cwd-attributable at all, likely run-to-run noise on a command that's flaky under tests anyway. Real count is closer to 8 cwd-attributable charms, not 10.

tonyandrewmeyer and others added 3 commits June 13, 2026 10:47
…ario

Under Juju the working directory at hook start is the charm root; under
Scenario it is wherever pytest ran. Unconditionally chdir'ing to the
charm root, as the previous commit on this branch did, broke roughly
eight charms in the curated cache.

Switch to an audit-hook based deprecation path: by default leave the
working directory alone but install a `sys.addaudithook` (scoped via
`threading.local` to event dispatch) that emits a `DeprecationWarning`
the first time charm code opens, lists, or scans a relative path that
would resolve differently between the test cwd and the charm root. Set
`SCENARIO_CHDIR_TO_CHARM_ROOT=1` to opt in to the Juju-matching
behaviour now, ahead of the default flipping in a future major release.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@tonyandrewmeyer

Copy link
Copy Markdown
Collaborator Author

I've changed this to go the deprecation warning and env var direction. Not simple to warn at the right time - I've used an audit hook (added in 3.8, so fine for our 3.10+), which is a little wild, and my first use of these for real, but doesn't seem outside the bounds of what they are meant for.

@james-garner-canonical

Copy link
Copy Markdown
Contributor

B → C diff: 10 charms flipped passed → failed, every one cwd-attributable. Of those, 7 raise FileNotFoundError on a path that's relative-to-cwd (charm code reads templates/X.j2 / src/Y.json / src/prometheus_alert_rules/... directly instead of from self.framework.charm_dir); 2 surface as Jinja TemplateNotFound (vault-k8s-operator/{k8s,machine}, the same root cause via the template loader's search path); and 2 (prometheus-k8s-operator, grafana-agent-operator) shell out to rm ./Certum_Trusted_Root_CA.pem for test cleanup. Pre-patch that hits the test's own working copy; post-patch the relative path resolves into the system CA store and bounces off permissions, which is the patch incidentally surfacing a test-side safety bug (test would have deleted a system file if it ran with elevated perms).

I'm particularly concerned about FileNotFoundError on a path that's relative to the actual charm root (src/Y.json, src/prometheus...) etc. Arguably they should use self.framework.charm_dir, sure, but aren't these particular breakages the opposite of what we're trying to address with this PR? Here the charm is presumably running pytest from the charm root (matching Juju's behaviour of running the charm from there), and we're introducing an error for them with this change due to our selective population of the virtual charm root.

@tromai tromai left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding the deprecation warning. Audit hook looks really powerful to me.

I made a minor request, and some questions to help me understand the fix better.

Comment on lines +66 to +67
if path is None or isinstance(path, int):
return

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious in which case can path be an int.
I am thinking of the argument buffering to open, is that correct?

state = getattr(_CWD_AUDIT_STATE, 'current', None)
if state is None:
return
if event not in ('open', 'os.listdir', 'os.scandir'):

@tromai tromai Jun 18, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the criteria for selecting this set of events. I assume it came from the 8 charms which depends on Scenario's 'unexpected' dir behavior ? If that's true, it's all good for me, as I don't think we should include more events.

return
if path_str in state['warned']:
return
state['reentrant'] = True

@tromai tromai Jun 18, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, reentrant is used to prevent infinite recursion from this function because we do file operations included in the warning event ('open', 'os.listdir', 'os.scandir')?

If that is true, may I suggest having a small comment describing the purpose here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

os.getcwd() is not the charm root in ops[testing]

3 participants