Skip to content

Import NGWPC Python InterpreterUtil changes#968

Open
PhilMiller wants to merge 4 commits into
NOAA-OWP:masterfrom
PhilMiller:PhilMiller/python-interpreter-stuff
Open

Import NGWPC Python InterpreterUtil changes#968
PhilMiller wants to merge 4 commits into
NOAA-OWP:masterfrom
PhilMiller:PhilMiller/python-interpreter-stuff

Conversation

@PhilMiller
Copy link
Copy Markdown
Contributor

@PhilMiller PhilMiller commented May 15, 2026

We encountered a bug resulting from an exception safety deficiency in the InterpreterUtil, where it would create NULL module map entries when loading something failed, and later attempts would then dereference those invalid entries.

Additions

  • Introduce a regression test that is confirmed to actually fail under the original conditions in which the bug was observed

Changes

  • Fix the exception safety bug
  • Shift code from the header to a separate source file in its own library target

Notes

I suggest reviewing each commit separately, since the test and fix are small and substantive, while the last is larger and just a structural refactoring.

The bugfix and code reorganization are as in the NGWPC development branch, while the other commits are new to this PR.

Testing

  1. Introduced test fails without the subsequent fix, if the codebase is forcibly built as C++14 rather than C++17, and passes with it
  2. Github CI

Screenshots

Before:

[----------] 1 test from InterpreterUtilTest
[ RUN      ] InterpreterUtilTest.FailedImportDoesNotPoisonModuleMap
/Users/phil/Code/noaa/ngen/test/utils/python/InterpreterUtil_Test.cpp:63: Failure
Expected: module = interp->getModule(bogus_name) throws an exception of type py::error_already_set.
  Actual: it throws nothing.
/Users/phil/Code/noaa/ngen/test/utils/python/InterpreterUtil_Test.cpp:64: Failure
Expected equality of these values:
  module.ptr()
    Which is: NULL
  module_saved.ptr()
    Which is: 0x1053fb6f0
[  FAILED  ] InterpreterUtilTest.FailedImportDoesNotPoisonModuleMap (0 ms)
[----------] 1 test from InterpreterUtilTest (0 ms total)

After:

[----------] 1 test from InterpreterUtilTest
[ RUN      ] InterpreterUtilTest.FailedImportDoesNotPoisonModuleMap
WARNING: importTopLevelModule: ngen_definitely_not_a_real_module_xyzzy: ModuleNotFoundError: No module named 'ngen_definitely_not_a_real_module_xyzzy'
Already imported modules: numpy, pathlib, sys, test_bmi_py, 
WARNING: importTopLevelModule: ngen_definitely_not_a_real_module_xyzzy: ModuleNotFoundError: No module named 'ngen_definitely_not_a_real_module_xyzzy'
Already imported modules: numpy, pathlib, sys, test_bmi_py, 
[       OK ] InterpreterUtilTest.FailedImportDoesNotPoisonModuleMap (0 ms)
[----------] 1 test from InterpreterUtilTest (0 ms total)

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Comment thread include/utilities/python/InterpreterUtil.hpp Outdated
@PhilMiller PhilMiller force-pushed the PhilMiller/python-interpreter-stuff branch from 4858170 to 5a018c3 Compare May 15, 2026 17:49
@PhilMiller PhilMiller marked this pull request as draft May 15, 2026 17:59
@PhilMiller PhilMiller force-pushed the PhilMiller/python-interpreter-stuff branch from 5a018c3 to 3108309 Compare May 15, 2026 18:00
@PhilMiller
Copy link
Copy Markdown
Contributor Author

The test_unit CI job doesn't enable Python, so that's not exhibited as red here. Will address somehow.

@PhilMiller PhilMiller force-pushed the PhilMiller/python-interpreter-stuff branch from 3108309 to b03270f Compare May 15, 2026 21:08
@PhilMiller
Copy link
Copy Markdown
Contributor Author

Lol, so I realized in trying to reproduce the error in a unit/regression test to red/green this PR that the move to C++17 somewhat obviates the change. https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0145r3.pdf guarantees that the RHS is fully evaluated, and hence the exception thrown, before the LHS starts to be evaluated and creates the NULL default-initialized map entry.

I think it's maybe still worth integrating this regardless,

  • in case someone wants to apply it to a slightly older version of the code,
  • to work down the differences from the NGWPC version of the codebase
  • to make the exception safety more explicit, rather than relying on the subtle language behavior

PhilMiller and others added 2 commits May 15, 2026 15:18
…ule()

If the call to pybind11's pybind11::module_::import(topLevelName)
failed and threw an exception, the map of importedTopLevelModules was
left with a default-constructed entry containing a pybind11::object
with a NULL pointer. Later attempts to use that top level module would
believe it had been imported, because the key existed in the map, and
then dereference that NULL pointer and crash.

Thus, separate the import call from the modification to the map, so
that the map is left unmodified if the import throws.
@PhilMiller PhilMiller force-pushed the PhilMiller/python-interpreter-stuff branch 2 times, most recently from 88f67fb to 8aee3a3 Compare May 15, 2026 23:25
@PhilMiller PhilMiller marked this pull request as ready for review May 15, 2026 23:34
Copy link
Copy Markdown
Contributor

@christophertubbs christophertubbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the strict version checking be relaxed a tad - requiring 3.11 or something? Great. Requiring 3.11.6, specifically? A tad dicey.

A potential risk factor I've found was a lack of gil locking (unless I missed it somewhere). That's a ticking multithreaded time bomb.

I understand that a good bit of my concerns are a tad out of scope - if you're just trying to fix one thing and move implementation code out of the header, you don't necessarily want to rock the boat too much. But... there are technically new lines of code so I reserve the right to complain.

I'm setting this review as just a comment - someone with a higher skill level should review. A lot of this is just a tad too clever for me and I think most of my complaints are preexisting issues.

@@ -32,7 +31,6 @@ namespace utils {
class InterpreterUtil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe out of scope, but I wish there was a bit more of a clear "This is why we're juggling public and private scope definitions" comment. Made sense after a bit of research, though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I dunno. There feels like some sort of intuitive order in these declarations. I'm just not messing with it here. Perhaps the juggling is more tangible because there are no longer function bodies interspersed to hide that it's going on.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's in that order because things need to be deconstructed in a highly specific order.

My primary concern is that someone will go through and try to "fix" it.

Comment thread src/utilities/python/InterpreterUtil.cpp
}

InterpreterUtil::InterpreterUtil() {
guardPtr = std::make_shared<py::scoped_interpreter>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this-> would make my life easier. Without it, I have to go hunting for more information on it. Not a huge deal, but it would make it easier for the lower skilled C++ dev.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've worked in a whole bunch of different C++ code bases, and this-> has not been idiomatic in any of them. It's kinda taken as a signature of someone who learned to code in Java, and never really acclimated to the switch to C++.

More common for marking member variables as distinct from argument and locals were prefix m_foo or just _foo, or suffix foo_ (my preference).

In the m_ case, globals were g_, function arguments a_, static variables s_, and local variables were unmarked. Even that was a pretty eccentric style, that I haven't seen in any code not sharing that library's heritage (LBL BoxLib/Chombo/AMREX).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kinda taken as a signature of someone who learned to code in Java, and never really acclimated to the switch to C++.

Rude

Anyways, I generally go with foo_ as well. I should have put "my" and "lower skilled C++ dev" in bold. We're going to have a lot of primary Python and R devs passing through this, so I always err on the side of "as obvious as humanly possible, even if it's a tad excessive". As it stands, guardPtr is a tad ambiguous (to me at least), hence the preference for this->.

A transition for this styling is out of scope, but I wanted the preference known at least.

Resolve when you see this. I'd do it, but I want to make sure I'm not shouting into the void.

Comment thread src/utilities/python/InterpreterUtil.cpp

importTopLevelModule("numpy");
py::str runtime_numpy_version = importedTopLevelModules["numpy"].attr("version").attr("version");
if(std::string(runtime_numpy_version) != numpy_version) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern of strict version comparison. It's obviously a little trickier to be less strict here, but it'd help make the system more resilient if it just went "Major minor? Fine". That is, unless it's doing a further check to ensure that some internal processing/runtime linkage assumption is correct.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above: I think there is a need for this, because the implementation is (independent of this code) not always able to handled version mismatches.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bobby/Phil - mark this as resolved if my concern isn't worth addressing due to high version incompatibility risk.

Comment thread src/utilities/python/InterpreterUtil.cpp
}
}

/* static */ py::object InterpreterUtil::getPyModule(const std::string &name) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the chances that this could lead to a py::object leakage (i.e. the py::object living longer than the instance)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably pretty high if misused. Thankfully, it's not much used throughout the codebase.

If we were writing in Rust, there would be a lifetime parameter on the return value matching that of the InterpreterUtil itself. We don't have lifetime parameters in C++. Maybe in C++ 2035 or something.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably pretty high if misused.

So it'll bite us immediately 😄

Is there anything we can do to highlight "Hey, check this thing out here if you're running into this type of problem"? An inline comment would only help if you knew exactly where to look for it. If we had logging 100% locked down, we could have a debug statement there, but that's not really the case.

This is all just a spitball - I can't think of a solid solution since I don't think we have a diagnostic guide or anything for it.

Feel free to mark as 'resolved' if it's sort of a 'not much we can reasonably do' situation.

Copy link
Copy Markdown
Contributor

@robertbartel robertbartel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a couple of very small things that, assuming I haven't missed something, strictly speaking should be fixed. But otherwise, this looks good to me. Tweak or explain those, and then I'll come back and approve.

Comment thread src/utilities/python/InterpreterUtil.cpp

importTopLevelModule("numpy");
py::str runtime_numpy_version = importedTopLevelModules["numpy"].attr("version").attr("version");
if(std::string(runtime_numpy_version) != numpy_version) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above: I think there is a need for this, because the implementation is (independent of this code) not always able to handled version mismatches.

Comment thread src/utilities/python/InterpreterUtil.cpp Outdated
Comment thread src/utilities/python/InterpreterUtil.cpp
@PhilMiller PhilMiller force-pushed the PhilMiller/python-interpreter-stuff branch from 8aee3a3 to 1680d81 Compare May 18, 2026 23:00
@PhilMiller PhilMiller force-pushed the PhilMiller/python-interpreter-stuff branch from 1680d81 to a79ff67 Compare May 18, 2026 23:02
@PhilMiller
Copy link
Copy Markdown
Contributor Author

Chris, the rest of your comments that I haven't made changes for concern pre-existing code, that's just being moved from a header file to a source file. So, I think they're out of scope for addressing in this PR. To the extent they ought to be addressed, that is work that can be planned down the line. It's definitely not on our budget or tasking right now.

I'll still reply to the comments.

@christophertubbs
Copy link
Copy Markdown
Contributor

Chris, the rest of your comments that I haven't made changes for concern pre-existing code, that's just being moved from a header file to a source file. So, I think they're out of scope for addressing in this PR. To the extent they ought to be addressed, that is work that can be planned down the line. It's definitely not on our budget or tasking right now.

Which is why I noted that you wouldn't want to make major changes when it comes to just moving implementation to the correct location.

The gil locking probably needs to be handled sooner rather than later, though, and I think it might result in the need for different access patterns since you probably don't want to play with items like py::object outside the scope of the lock.

Do you guys want me to go ahead and add a ticket for it or do you guys want to try and figure out the best way to address it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants