Skip to content

Integration and comparison pytest suites are order-dependent and write generated artifacts into the repo #35

@vivian99cc-a11y

Description

@vivian99cc-a11y

Summary

The integration and MATLAB/Python comparison tests are currently coupled through generated files in repository-level output folders. This makes the comparison suite order-dependent and not self-contained.

Current behavior

  • Integration tests write generated CSV/PNG artifacts into folders such as test_output/ and test_plots/.
  • Comparison tests then expect those Python outputs to already exist before comparing against MATLAB reference CSVs.
  • Running python/tests/compare directly fails if the integration outputs were not generated first.
  • Several integration tests are effectively artifact generators: they exercise code and save data, but do not assert key metrics directly.

Examples

  • python/tests/integration/test_basic.py writes test_output/test_basic/*_python.csv and a PNG without direct assertions.
  • python/tests/integration/test_analyze_spectrum.py saves spectrum metrics for later comparison rather than asserting them in place.
  • python/tests/compare/test_compare_basic.py depends on test_reference/test_basic and test_output/test_basic existing.
  • The shared comparison runner now fails loudly when references or generated outputs are missing, which exposes the ordering dependency.

Why this matters

  • pytest python/tests/compare is not a reliable standalone command.
  • Test results can depend on whether stale test_output/ files exist from a previous run.
  • CI cannot cleanly separate unit, integration, and golden-reference jobs unless prerequisites are explicit.
  • Generated artifacts can hide real regressions if they are stale or partially regenerated.

Suggested fixes

  • Make integration tests use tmp_path for generated outputs wherever possible.
  • Add direct assertions for important metrics in integration tests instead of only saving CSVs.
  • Give comparison tests fixtures that generate the required Python outputs in a temporary directory, or document and enforce a separate golden-data workflow.
  • Consider splitting artifact generation into a dedicated script/command instead of normal pytest tests.
  • Ensure pytest python/tests/compare either runs self-contained or skips/fails with a clear documented prerequisite.

Related

This is related to the stricter comparison runner behavior, but it is a broader test design issue rather than only a runner bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions