Improvements to testing infrastructure by adeliegorce · Pull Request #491 · HERA-Team/hera_pspec

adeliegorce · 2026-05-20T13:41:14Z

Closes #414

This PR implements improvements to the testing infrastructure required by #414 such as

Don't use UnitTest anywhere
Use tmpdir and tmp_path instead of creating files and then deleting them manually.
Use pytest.parametrize instead of in-test for-loops.
Split up a bunch of long tests into smaller units.
Contextualise pytest.raises calls

It is going to be a chunky one so we are starting with a draft PR.

for more information, see https://pre-commit.ci

codecov · 2026-05-20T13:48:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.94%. Comparing base (4fce4a3) to head (a4059fc).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #491   +/-   ##
=======================================
  Coverage   95.94%   95.94%           
=======================================
  Files          19       19           
  Lines        6628     6633    +5     
=======================================
+ Hits         6359     6364    +5     
  Misses        269      269

Flag	Coverage Δ
unittests	`95.94% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

steven-murray · 2026-05-20T21:05:01Z

+    assert test.kunits.is_equivalent(units.Mpc ** (-1))
+
+
+def test_uvwindow_from_uvpspec(uvwindow_setup):


In general, for tests like this that are currently trying to test all the functionality of a particular function, I would instead break the test up into much smaller chunks, each of which ideally tests ONE THING only.

There are MANY examples of this (especially in test_pspec but you haven't touched that yet!)

steven-murray · 2026-05-20T21:06:15Z

+
+    # initialise object
+    test = uvwindow.UVWindow(ftbeam_obj=uvwindow_setup.ft_beam_obj_spw)
+    kgrid, kperp_norm = test._get_kgrid(bl_len)


I'm wary of these kinds of tests too -- they raise the coverage, but don't test anything except that the function doesn't blow up with an error. It would be better to actually test that it gives correct results!

for more information, see https://pre-commit.ci

…ithub.com/HERA-Team/hera_pspec into 414-some-proposed-improvements-to-testing

for more information, see https://pre-commit.ci

adeliegorce · 2026-06-03T16:04:38Z

The PR is ready for review! I have implemented all of the checklist, as well as possible - I let the reviewer be the judge of that! there is a lot so I am happy to provide more explanations if needed. I kept and eye on how the refactoring went to make sure no test got removed or emptied of its meaning but it is difficult to be 100% sure. The last task, which should probably be tackled as a separate PR, is to look through the tests and identify the ones that "only raise the coverage, but don't test anything except that the function doesn't blow up with an error. It would be better to actually test that it gives correct results!"

steven-murray

Thank you for this gargantuan effort @adeliegorce!

There are a lot of good things in this PR. In saying that, I've made a bunch of comments here. These are not meant to be exhaustive: almost all of them are representative of a broader pattern that you could ask your chosen AI to go and fix.

I think the big thing to do at first is choose a consistent structure. I prefer a structure where a source module gets a test module, a source function gets a test class, and then there are very modular tests of that function inside the class.

THe other big thing is to try and modularise the tests a lot more: currently there are some absolute humungous tests which should probably be broken down into smaller specific tests.

The other hting I noticed generally is that there is a lot of code repetition for test setup (reading the same beam file many times throughout the test base for example). We could probably slim down the total run time by cleaning this up. I think a targeted request to Claude to identify as many repeated bits of code as possible should help.

steven-murray · 2026-06-03T18:32:48Z

-        newuvp = new.get_pspec("pspecgroup", "name")
-        assert all(blp in vanilla_uvp.get_blpairs() for blp in newuvp.get_blpairs())
-        assert len(newuvp.get_blpairs()) == len(vanilla_uvp.get_blpairs())
+def test_happy_path(vanilla_uvp: UVPSpec, single_baseline_files: list[Path]):


I would actually move these back into the class as they were originally, because soon we'll have #496 which will add more cli functions, and its best to keep them organized.

steven-murray · 2026-06-03T18:34:11Z

-
-        # Check that power spectra can be overwritten
+@pytest.fixture
+def container_fname(tmp_path):


Suggested change

def container_fname(tmp_path):

def container_fname(tmp_path: Path) -> Path:

steven-murray · 2026-06-03T18:43:55Z



-def test_combine_psc_spectra():
+def test_combine_psc_spectra(tmp_path):


This is maybe too much for this PR, but I would love to see it happen: instead of using files kept around in the data directory, it would be great if we could quickly make valid small UVPSpec objects on the fly to use in testing.

Probably some of the tests should be using stored files -- those that are testing reading old file formats perhaps -- but in general it's much better to create these on the fly (and it means we can see what's IN THEM by looking at the code that's generating them!).

steven-murray · 2026-06-03T19:24:59Z

+def uvd():
+    uvdata = UVData()
+    uvdata.read_miriad(os.path.join(DATA_PATH, "zen.2458042.17772.xx.HH.uvXA"))
+    return uvdata


Does this really need to be a different uvd than in the other modules?

No, the local uvd fixture cannot be replaced by any existing conftest fixture. The local fixture loads zen.2458042.17772.xx.HH.uvXA, while conftest has:

uvd_zen_2458116 — loads zen.2458116.31939.HH.uvh5 (different file, different format)

uvd_zen_even_xx — loads zen.even.xx.LST.1.28828.uvOCRSA (different file)

The tests using uvd (in test_calc_blpair_reds) assert specific baseline counts (e.g. len(bls1) == 12, blps == [((24, 25), (37, 38))], specific antenna indices like 24, 25, 37, 38, 39) that are tied to the exact array geometry of that file. Swapping in a different dataset would break those assertions.

OK good to know. Even so, I would prefer to name these uvd's more descriptively so we know what on earth they are and why we're using that one not something else!

This is an excellent question to which I have no answer :) something to keep in mind..

steven-murray · 2026-06-03T19:32:07Z

-class test_FTBeam(unittest.TestCase):
-    def setUp(self):
+@pytest.fixture()
+def make_ft_beam_obj():


This doesn't need to be a fixture -- it can just be a function.

adeliegorce · 2026-06-03T19:49:19Z

Hi! Thanks for having a look.

"I prefer a structure where a source module gets a test module, a source function gets a test class, and then there are very modular tests of that function inside the class." -> I am not sure to be familiar with this structure. My understanding was that pytest rather relies on a lot of independent methods and not large classes; sounds like this might not be true. Would you have examples of how you'd like this to be done?
"THe other big thing is to try and modularise the tests a lot more" -> split test_pspecdata and test_uvwindow into little chunks as I figured they had the largest independent tests; but will take in the comments and look at other modules
"The other hting I noticed generally is that there is a lot of code repetition for test setup" -> It might be surprising but I have already asked Claude to add as many fixtures in common to as many tests as possible :D I'll try to be more specific in my prompts

steven-murray · 2026-06-04T08:23:09Z

Structure

Concerning the structure, you're right that big classes are not used in pytest in the same kind of way they are in python. It is not idiomatic to rely on advanced features of classes in tests. However, I like to use them as a point in the heirarchy, i.e.:

package/
  module:
     class::
       test1()
       test2()

That is, their only role is to group related tests. You certainly don't have to use this structure -- you could instead do something like:

package/
  subpackage/
    module:
      test1()
      test2()

Either way, the point to realize is that one function in the source code (e.g. pspec_run() should have many tests. So you should not have one test function per source function. There are lots of ways around this. Let's say for example I wanted to test the function sourcefunc() in the module sourcemod. I could do it like this:

tests/
  test_sourcemod.py
    class TestSourceFunc:
      test1()
      test2()

or I could do it like this:

tests/
  sourcemod/
    test_sourcefunc.py:
      test1()
      test2()

Or I could do it like this:

tests/
  test_sourcemod.py
    test_sourcefunc_test1()
    test_sourcefunc_test2()

I prefer the first way most of the time. Sometimes I differ from that, for example if a module really only has one function in it, then I'll often make my module correspond to the function and just have bare tests in there with no class. Or sometimes I have a test module that doesn't correspond to a particular source module, but rather corresponds to a kind of testing (for example in my powerbox code I have one test module dedicated to doing roundtrip testing of producing a gaussian random field then estimating the power from that GRF and comparing to the known input).

There are also complications -- sometimes there is an extra bit of heirarchy in the source, i.e. a class. How do you test the UVPSPec object, for example? My take would be to consider classes like this at the same level as modules so have a test module called test_uvpspec.py then in that have classes that correspond to the major methods of the object, and within them have individual tests of those methods.

Modularization

Yeah I really like what you did with uvwindow. I think similar things can be done with other existing test functions (some of which I pointed out). Nevertheless, this is not absolutely required for this PR. It would be nice, but we can always iterate over time.

Repetition

Haha, well it certainly did some of it. I would tell it that essentially any time it is reading a file from the DATA_PATH and creating an object from it, that should be a fixture. The only caveats are tests that are specifically testing aspects of the reading process.

I would also love to check whether we really need so many different data files -- what is the essential difference between the data products that we really need to have so many different ones? It seems like we use them essentially at random much of the time. But this is a bigger question and might take more effort than it is worth in this PR?

for more information, see https://pre-commit.ci

Co-authored-by: Steven Murray <steven.g.murray@asu.edu>

for more information, see https://pre-commit.ci

First part of switch from unittest to pytest

4ecdab3

adeliegorce linked an issue May 20, 2026 that may be closed by this pull request

Some proposed improvements to testing #414

Open

4 tasks

pre-commit-ci Bot and others added 2 commits May 20, 2026 13:41

[pre-commit.ci] auto fixes from pre-commit.com hooks

ffaffae

for more information, see https://pre-commit.ci

Merge branch 'main' into 414-some-proposed-improvements-to-testing

a5ade9d