Summary
Several pytest tests use random data without a fixed seed, perform broad/slow parameter sweeps, or only check that plotting functions did not crash. These tests are useful as exploratory smoke coverage, but they are weak as deterministic regression tests.
Examples
Unseeded randomness
Many tests call np.random.randn, np.random.rand, np.random.permutation, or similar global RNG APIs without a fixed seed. Examples include:
python/tests/unit/aout/test_analyze_error_by_phase.py
python/tests/unit/aout/test_analyze_error_by_value.py
python/tests/unit/aout/test_decompose_harmonics.py
python/tests/unit/calibration/test_verify_estimate_frequencies.py
- several
python/tests/unit/spectrum/* tests
Calibration tests are print-heavy and sweep-heavy
python/tests/unit/calibration/test_verify_calibration_lite.py runs many broad sweeps and prints metrics such as weight error, SNDR, and ENOB, but several sweeps do not assert the expected bounds.
python/tests/unit/calibration/test_verify_estimate_frequencies.py prints whether frequency estimates are good or bad, while the actual threshold assertions are commented out.
Plot tests are often smoke-only
Some plotting tests only assert that a figure/file/result exists, for example:
python/tests/unit/dout/test_plot_residual_scatter.py
python/tests/unit/spectrum/test_sweep_performance_vs_osr.py
- many AOUT/Spectrum plot tests that only verify PNG creation
Smoke tests are useful, but they should be labeled as such and complemented with structural or numeric checks when the plotted data encodes important behavior.
Why this matters
- Unseeded tests can pass locally and fail in CI with a different random draw.
- Long sweeps slow down feedback and make failures harder to isolate.
- Print-only checks do not protect against regressions.
- Smoke-only plot assertions can pass even if the plotted data is wrong.
Suggested fixes
- Replace global RNG calls with
np.random.default_rng(seed) or fixed RandomState where reproducibility matters.
- Convert printed pass/fail thresholds into explicit assertions.
- Split long calibration sweeps into a small deterministic regression test plus optional stress/performance tests.
- Mark long exploratory sweeps with a dedicated marker such as
@pytest.mark.slow.
- For plot tests, assert returned data shape, axis count, labels, plotted line/bar counts, or representative numeric values in addition to checking that output files exist.
Expected result
The default pytest suite should be deterministic, reasonably fast, and should fail only when a meaningful behavior contract is violated.
Summary
Several pytest tests use random data without a fixed seed, perform broad/slow parameter sweeps, or only check that plotting functions did not crash. These tests are useful as exploratory smoke coverage, but they are weak as deterministic regression tests.
Examples
Unseeded randomness
Many tests call
np.random.randn,np.random.rand,np.random.permutation, or similar global RNG APIs without a fixed seed. Examples include:python/tests/unit/aout/test_analyze_error_by_phase.pypython/tests/unit/aout/test_analyze_error_by_value.pypython/tests/unit/aout/test_decompose_harmonics.pypython/tests/unit/calibration/test_verify_estimate_frequencies.pypython/tests/unit/spectrum/*testsCalibration tests are print-heavy and sweep-heavy
python/tests/unit/calibration/test_verify_calibration_lite.pyruns many broad sweeps and prints metrics such as weight error, SNDR, and ENOB, but several sweeps do not assert the expected bounds.python/tests/unit/calibration/test_verify_estimate_frequencies.pyprints whether frequency estimates are good or bad, while the actual threshold assertions are commented out.Plot tests are often smoke-only
Some plotting tests only assert that a figure/file/result exists, for example:
python/tests/unit/dout/test_plot_residual_scatter.pypython/tests/unit/spectrum/test_sweep_performance_vs_osr.pySmoke tests are useful, but they should be labeled as such and complemented with structural or numeric checks when the plotted data encodes important behavior.
Why this matters
Suggested fixes
np.random.default_rng(seed)or fixedRandomStatewhere reproducibility matters.@pytest.mark.slow.Expected result
The default pytest suite should be deterministic, reasonably fast, and should fail only when a meaningful behavior contract is violated.