Skip to content

Improvements to testing infrastructure#491

Open
adeliegorce wants to merge 71 commits into
mainfrom
414-some-proposed-improvements-to-testing
Open

Improvements to testing infrastructure#491
adeliegorce wants to merge 71 commits into
mainfrom
414-some-proposed-improvements-to-testing

Conversation

@adeliegorce

@adeliegorce adeliegorce commented May 20, 2026

Copy link
Copy Markdown
Contributor

Closes #414

This PR implements improvements to the testing infrastructure required by #414 such as

  • Don't use UnitTest anywhere
  • Use tmpdir and tmp_path instead of creating files and then deleting them manually.
  • Use pytest.parametrize instead of in-test for-loops.
  • Split up a bunch of long tests into smaller units.
  • Contextualise pytest.raises calls

It is going to be a chunky one so we are starting with a draft PR.

@adeliegorce adeliegorce linked an issue May 20, 2026 that may be closed by this pull request
4 tasks
@codecov

codecov Bot commented May 20, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.94%. Comparing base (4fce4a3) to head (a4059fc).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #491   +/-   ##
=======================================
  Coverage   95.94%   95.94%           
=======================================
  Files          19       19           
  Lines        6628     6633    +5     
=======================================
+ Hits         6359     6364    +5     
  Misses        269      269           
Flag Coverage Δ
unittests 95.94% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread tests/test_noise.py Outdated
Comment thread tests/test_pstokes.py Outdated
Comment thread tests/test_uvwindow.py Outdated
Comment thread tests/test_uvwindow.py Outdated
Comment thread tests/test_uvwindow.py Outdated
Comment thread tests/test_uvwindow.py Outdated
assert test.kunits.is_equivalent(units.Mpc ** (-1))


def test_uvwindow_from_uvpspec(uvwindow_setup):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, for tests like this that are currently trying to test all the functionality of a particular function, I would instead break the test up into much smaller chunks, each of which ideally tests ONE THING only.

There are MANY examples of this (especially in test_pspec but you haven't touched that yet!)

Comment thread tests/test_uvwindow.py Outdated

# initialise object
test = uvwindow.UVWindow(ftbeam_obj=uvwindow_setup.ft_beam_obj_spw)
kgrid, kperp_norm = test._get_kgrid(bl_len)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wary of these kinds of tests too -- they raise the coverage, but don't test anything except that the function doesn't blow up with an error. It would be better to actually test that it gives correct results!

@adeliegorce adeliegorce added type: maint: refactoring Refactoring without behavior change. type: maint: style Code style and formatting. labels Jun 3, 2026
@adeliegorce adeliegorce marked this pull request as ready for review June 3, 2026 16:02
@adeliegorce

Copy link
Copy Markdown
Contributor Author

The PR is ready for review! I have implemented all of the checklist, as well as possible - I let the reviewer be the judge of that! there is a lot so I am happy to provide more explanations if needed. I kept and eye on how the refactoring went to make sure no test got removed or emptied of its meaning but it is difficult to be 100% sure. The last task, which should probably be tackled as a separate PR, is to look through the tests and identify the ones that "only raise the coverage, but don't test anything except that the function doesn't blow up with an error. It would be better to actually test that it gives correct results!"

@steven-murray steven-murray left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this gargantuan effort @adeliegorce!

There are a lot of good things in this PR. In saying that, I've made a bunch of comments here. These are not meant to be exhaustive: almost all of them are representative of a broader pattern that you could ask your chosen AI to go and fix.

I think the big thing to do at first is choose a consistent structure. I prefer a structure where a source module gets a test module, a source function gets a test class, and then there are very modular tests of that function inside the class.

THe other big thing is to try and modularise the tests a lot more: currently there are some absolute humungous tests which should probably be broken down into smaller specific tests.

The other hting I noticed generally is that there is a lot of code repetition for test setup (reading the same beam file many times throughout the test base for example). We could probably slim down the total run time by cleaning this up. I think a targeted request to Claude to identify as many repeated bits of code as possible should help.

Comment thread src/hera_pspec/uvwindow.py Outdated
Comment thread tests/test_cli.py
newuvp = new.get_pspec("pspecgroup", "name")
assert all(blp in vanilla_uvp.get_blpairs() for blp in newuvp.get_blpairs())
assert len(newuvp.get_blpairs()) == len(vanilla_uvp.get_blpairs())
def test_happy_path(vanilla_uvp: UVPSpec, single_baseline_files: list[Path]):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually move these back into the class as they were originally, because soon we'll have #496 which will add more cli functions, and its best to keep them organized.

Comment thread tests/test_container.py Outdated

# Check that power spectra can be overwritten
@pytest.fixture
def container_fname(tmp_path):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def container_fname(tmp_path):
def container_fname(tmp_path: Path) -> Path:

Comment thread tests/test_container.py Outdated
Comment thread tests/test_container.py Outdated


def test_combine_psc_spectra():
def test_combine_psc_spectra(tmp_path):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is maybe too much for this PR, but I would love to see it happen: instead of using files kept around in the data directory, it would be great if we could quickly make valid small UVPSpec objects on the fly to use in testing.

Probably some of the tests should be using stored files -- those that are testing reading old file formats perhaps -- but in general it's much better to create these on the fly (and it means we can see what's IN THEM by looking at the code that's generating them!).

Comment thread tests/test_utils.py
Comment on lines +15 to +18
def uvd():
uvdata = UVData()
uvdata.read_miriad(os.path.join(DATA_PATH, "zen.2458042.17772.xx.HH.uvXA"))
return uvdata

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really need to be a different uvd than in the other modules?

@adeliegorce adeliegorce Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the local uvd fixture cannot be replaced by any existing conftest fixture. The local fixture loads zen.2458042.17772.xx.HH.uvXA, while conftest has:

  • uvd_zen_2458116 — loads zen.2458116.31939.HH.uvh5 (different file, different format)
  • uvd_zen_even_xx — loads zen.even.xx.LST.1.28828.uvOCRSA (different file)

The tests using uvd (in test_calc_blpair_reds) assert specific baseline counts (e.g. len(bls1) == 12, blps == [((24, 25), (37, 38))], specific antenna indices like 24, 25, 37, 38, 39) that are tied to the exact array geometry of that file. Swapping in a different dataset would break those assertions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK good to know. Even so, I would prefer to name these uvd's more descriptively so we know what on earth they are and why we're using that one not something else!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent question to which I have no answer :) something to keep in mind..

Comment thread tests/test_uvpspec.py Outdated
Comment thread tests/test_uvpspec.py Outdated
Comment thread tests/test_uvwindow.py
class test_FTBeam(unittest.TestCase):
def setUp(self):
@pytest.fixture()
def make_ft_beam_obj():

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be a fixture -- it can just be a function.

Comment thread tests/test_uvwindow.py
@adeliegorce

Copy link
Copy Markdown
Contributor Author

Hi! Thanks for having a look.

  • "I prefer a structure where a source module gets a test module, a source function gets a test class, and then there are very modular tests of that function inside the class." -> I am not sure to be familiar with this structure. My understanding was that pytest rather relies on a lot of independent methods and not large classes; sounds like this might not be true. Would you have examples of how you'd like this to be done?
  • "THe other big thing is to try and modularise the tests a lot more" -> split test_pspecdata and test_uvwindow into little chunks as I figured they had the largest independent tests; but will take in the comments and look at other modules
  • "The other hting I noticed generally is that there is a lot of code repetition for test setup" -> It might be surprising but I have already asked Claude to add as many fixtures in common to as many tests as possible :D I'll try to be more specific in my prompts

@steven-murray

Copy link
Copy Markdown
Contributor

Structure

Concerning the structure, you're right that big classes are not used in pytest in the same kind of way they are in python. It is not idiomatic to rely on advanced features of classes in tests. However, I like to use them as a point in the heirarchy, i.e.:

package/
  module:
     class::
       test1()
       test2()

That is, their only role is to group related tests. You certainly don't have to use this structure -- you could instead do something like:

package/
  subpackage/
    module:
      test1()
      test2()

Either way, the point to realize is that one function in the source code (e.g. pspec_run() should have many tests. So you should not have one test function per source function. There are lots of ways around this. Let's say for example I wanted to test the function sourcefunc() in the module sourcemod. I could do it like this:

tests/
  test_sourcemod.py
    class TestSourceFunc:
      test1()
      test2()

or I could do it like this:

tests/
  sourcemod/
    test_sourcefunc.py:
      test1()
      test2()

Or I could do it like this:

tests/
  test_sourcemod.py
    test_sourcefunc_test1()
    test_sourcefunc_test2()

I prefer the first way most of the time. Sometimes I differ from that, for example if a module really only has one function in it, then I'll often make my module correspond to the function and just have bare tests in there with no class. Or sometimes I have a test module that doesn't correspond to a particular source module, but rather corresponds to a kind of testing (for example in my powerbox code I have one test module dedicated to doing roundtrip testing of producing a gaussian random field then estimating the power from that GRF and comparing to the known input).

There are also complications -- sometimes there is an extra bit of heirarchy in the source, i.e. a class. How do you test the UVPSPec object, for example? My take would be to consider classes like this at the same level as modules so have a test module called test_uvpspec.py then in that have classes that correspond to the major methods of the object, and within them have individual tests of those methods.

Modularization

Yeah I really like what you did with uvwindow. I think similar things can be done with other existing test functions (some of which I pointed out). Nevertheless, this is not absolutely required for this PR. It would be nice, but we can always iterate over time.

Repetition

Haha, well it certainly did some of it. I would tell it that essentially any time it is reading a file from the DATA_PATH and creating an object from it, that should be a fixture. The only caveats are tests that are specifically testing aspects of the reading process.

I would also love to check whether we really need so many different data files -- what is the essential difference between the data products that we really need to have so many different ones? It seems like we use them essentially at random much of the time. But this is a bigger question and might take more effort than it is worth in this PR?

adeliegorce and others added 23 commits June 5, 2026 14:52
Co-authored-by: Steven Murray <steven.g.murray@asu.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: ci CI, workflow, coverage, warning checks, and release automation. type: maint: refactoring Refactoring without behavior change. type: maint: style Code style and formatting. type: testing Test additions, fixtures, and test-infrastructure changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Some proposed improvements to testing

2 participants