Skip to content

Fix pybind module link leak when testing+pybind enabled (2.15.0 regression)#75

Merged
gagelarsen merged 1 commit into
masterfrom
fix/testing-sources-runner-only
May 14, 2026
Merged

Fix pybind module link leak when testing+pybind enabled (2.15.0 regression)#75
gagelarsen merged 1 commit into
masterfrom
fix/testing-sources-runner-only

Conversation

@gagelarsen

Copy link
Copy Markdown
Member

Summary

2.15.0 enabled the combined pybind=True+testing=True+Debug configuration so coverage runs would exercise both run_cxx_tests and run_python_tests against the same instrumented binary. xmscore's first coverage run on 2.15.0 immediately failed:

```
ImportError: ... /xms/core/_xmscore.cpython-313-x86_64-linux-gnu.so:
undefined symbol: _ZN7CxxTest12charToStringEcPc
```

That's CxxTest::charToString(char, char*). It's declared in cxxtest/ValueTraits.h:85 but defined only in cxxtest/ValueTraits.cpp (a .cpp file shipped inside cxxtest's include dir), which is in turn pulled by cxxtest/Root.cpp — and Root.cpp is only included by the cxxtestgen-generated runner.cpp.

xmscore's TestTools.cpp uses CxxTest::ValueTraits<char> (whose inline ctor calls charToString). The template was compiling testing_sources like TestTools.cpp into the main static library — exactly the library the pybind module links. The pybind module link closure therefore contained an unresolved CxxTest symbol that the dynamic loader couldn't find at Python import time.

Pre-2.15.0 this was a latent bug: BUILD_TESTING+IS_PYTHON_BUILD was an impossible combination so the leak never reached daylight. 2.15.0 made the combination reachable, and the integration test (which the user explicitly approved despite my "land #68 first" advice) didn't catch it because the stub fixture had no testing_sources at all.

Fix

  1. CMakeLists.txt.jinja — remove the list(APPEND ${library_name}_sources testing_sources) block. Plumb testing_sources into a ${library_name}_testing_sources variable instead, and add it to the runner target via target_sources(runner PRIVATE ...) (cxxtest path) or directly in add_executable(runner ...) (gtest path). This aligns with the documented contract in docs/USAGE.md §5.2: "testing_sources are compiled into the test runner only."
  2. Stub fixture — add stub/testing/test_tools.cpp whose body uses ValueTraits<char> to force a CxxTest::charToString reference, plus test_tools.h and a call from stub.t.h so the .o gets pulled into the runner's link closure. With the pre-fix template, a coverage build of this stub would fail with the same dlopen error; with the fix, it succeeds. This closes the canary fidelity gap I saved to memory two messages before the regression shipped, and immediately violated.
  3. build_file_generator.py — default library_sources, library_headers, testing_headers, pybind_sources, pybind_headers to []. The docs already promised this; the code did not. (Caught while reproducing in WSL: a stub without all five lists tripped on StrictUndefined before even reaching the linker bug.)

Downstream impact

Any xms library that was implicitly relying on the buggy behavior — depending on xmscore as a Conan dep and using xmscore::TestTools symbols via the linked static lib — will now find those symbols missing. The documented contract has always been runner-only; such libraries are reaching outside the contract. If this turns out to bite more than xmscore, the follow-up is to expose a separate ${library}_testing static lib in package_info when testing=True. Not bundled here to keep the patch small.

Test plan

  • pytest tests/ -v — 532 unit tests pass, 2 expected skips.
  • flake8 . clean.
  • Regenerated stub CMakeLists.txt in WSL and confirmed: set(stub_testing_sources stub/testing/test_tools.cpp) lands; add_library(${PROJECT_NAME} STATIC ${stub_sources} ${stub_headers}) no longer includes testing_sources; target_sources(runner PRIVATE ${stub_testing_sources}) wires them into the runner.
  • xmscore Coverage workflow re-run against 2.15.1 once published — green confirms the fix.
  • Local end-to-end exercise of the stub integration test is blocked on the credential issue I hit (pybind11/3.0.1 not in the conancenter binary cache for our profile, conan tries Aquaveo, prompts interactively, fails). Diagnostic chain is otherwise tight enough to ship on.

🤖 Generated with Claude Code

Regression caught by xmscore CI in 2.15.0: under the new
``pybind=True+testing=True+Debug`` coverage configuration, the recipe's
``build()`` runs ``run_python_tests`` which loads the pybind module —
but the .so fails dlopen with ``undefined symbol:
CxxTest::charToString(char, char*)``.

Root cause is a latent bug in ``CMakeLists.txt.jinja``: it appends
``testing_sources`` (e.g. xmscore's ``TestTools.cpp``) to the main
library's source list. The main library is what the pybind module
links against. ``TestTools.cpp`` includes ``cxxtest/ValueTraits.h``
and uses ``ValueTraits<char>`` whose inline ctor calls
``CxxTest::charToString`` — declared in that header but defined only
in ``cxxtest/ValueTraits.cpp`` (which is included via
``cxxtest/Root.cpp`` only by the cxxtestgen-generated runner.cpp).
Pre-2.15.0 the combination ``BUILD_TESTING+IS_PYTHON_BUILD`` was
impossible so the leak was latent; 2.15.0 made it reachable and the
canary missed it because the stub fixture had no testing_sources.

Fix the template to compile testing_sources directly into the runner
target (matching the documented contract: "testing_sources are
compiled into the test runner only"). The main library no longer
carries those translation units so the pybind module's link closure
contains no CxxTest references.

Also:

* Update the stub fixture: add ``stub/testing/test_tools.cpp`` whose
  body forces a CxxTest::charToString reference via
  ``ValueTraits<char>``, and call it from ``stub.t.h`` so the
  test_tools .o gets pulled into the runner's link closure. With the
  pre-fix template a coverage build of the stub would fail with the
  same dlopen error; with the fix it succeeds. Closes the canary
  fidelity gap saved in feedback_canary_test_fidelity.md two messages
  before the regression shipped.
* Default ``library_sources``, ``library_headers``, ``testing_headers``,
  ``pybind_sources``, and ``pybind_headers`` to ``[]`` in the
  generator. The docs already promise this; the code did not.

Downstream impact: any xms library that was relying on the buggy
behavior (linking against xmscore's static lib to consume xmscore's
TestTools.cpp symbols) will now fail to find those symbols. The
documented contract has always been that testing_sources are runner-
only — such libraries are reaching outside the contract and need to
either duplicate the testing helpers or wait for a follow-up that
exposes a separate testing static lib.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gagelarsen gagelarsen merged commit 1d054c2 into master May 14, 2026
6 checks passed
@gagelarsen gagelarsen deleted the fix/testing-sources-runner-only branch May 14, 2026 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant