Integration and comparison pytest suites are order-dependent and write generated artifacts into the repo

## Summary

The integration and MATLAB/Python comparison tests are currently coupled through generated files in repository-level output folders. This makes the comparison suite order-dependent and not self-contained.

## Current behavior

- Integration tests write generated CSV/PNG artifacts into folders such as `test_output/` and `test_plots/`.
- Comparison tests then expect those Python outputs to already exist before comparing against MATLAB reference CSVs.
- Running `python/tests/compare` directly fails if the integration outputs were not generated first.
- Several integration tests are effectively artifact generators: they exercise code and save data, but do not assert key metrics directly.

## Examples

- `python/tests/integration/test_basic.py` writes `test_output/test_basic/*_python.csv` and a PNG without direct assertions.
- `python/tests/integration/test_analyze_spectrum.py` saves spectrum metrics for later comparison rather than asserting them in place.
- `python/tests/compare/test_compare_basic.py` depends on `test_reference/test_basic` and `test_output/test_basic` existing.
- The shared comparison runner now fails loudly when references or generated outputs are missing, which exposes the ordering dependency.

## Why this matters

- `pytest python/tests/compare` is not a reliable standalone command.
- Test results can depend on whether stale `test_output/` files exist from a previous run.
- CI cannot cleanly separate unit, integration, and golden-reference jobs unless prerequisites are explicit.
- Generated artifacts can hide real regressions if they are stale or partially regenerated.

## Suggested fixes

- Make integration tests use `tmp_path` for generated outputs wherever possible.
- Add direct assertions for important metrics in integration tests instead of only saving CSVs.
- Give comparison tests fixtures that generate the required Python outputs in a temporary directory, or document and enforce a separate golden-data workflow.
- Consider splitting artifact generation into a dedicated script/command instead of normal pytest tests.
- Ensure `pytest python/tests/compare` either runs self-contained or skips/fails with a clear documented prerequisite.

## Related

This is related to the stricter comparison runner behavior, but it is a broader test design issue rather than only a runner bug.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration and comparison pytest suites are order-dependent and write generated artifacts into the repo #35

Summary

Current behavior

Examples

Why this matters

Suggested fixes

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Integration and comparison pytest suites are order-dependent and write generated artifacts into the repo #35

Description

Summary

Current behavior

Examples

Why this matters

Suggested fixes

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions