Improvements to testing infrastructure#491
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #491 +/- ##
=======================================
Coverage 95.94% 95.94%
=======================================
Files 19 19
Lines 6628 6633 +5
=======================================
+ Hits 6359 6364 +5
Misses 269 269
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
| assert test.kunits.is_equivalent(units.Mpc ** (-1)) | ||
|
|
||
|
|
||
| def test_uvwindow_from_uvpspec(uvwindow_setup): |
There was a problem hiding this comment.
In general, for tests like this that are currently trying to test all the functionality of a particular function, I would instead break the test up into much smaller chunks, each of which ideally tests ONE THING only.
There are MANY examples of this (especially in test_pspec but you haven't touched that yet!)
|
|
||
| # initialise object | ||
| test = uvwindow.UVWindow(ftbeam_obj=uvwindow_setup.ft_beam_obj_spw) | ||
| kgrid, kperp_norm = test._get_kgrid(bl_len) |
There was a problem hiding this comment.
I'm wary of these kinds of tests too -- they raise the coverage, but don't test anything except that the function doesn't blow up with an error. It would be better to actually test that it gives correct results!
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…ithub.com/HERA-Team/hera_pspec into 414-some-proposed-improvements-to-testing
for more information, see https://pre-commit.ci
|
The PR is ready for review! I have implemented all of the checklist, as well as possible - I let the reviewer be the judge of that! there is a lot so I am happy to provide more explanations if needed. I kept and eye on how the refactoring went to make sure no test got removed or emptied of its meaning but it is difficult to be 100% sure. The last task, which should probably be tackled as a separate PR, is to look through the tests and identify the ones that "only raise the coverage, but don't test anything except that the function doesn't blow up with an error. It would be better to actually test that it gives correct results!" |
steven-murray
left a comment
There was a problem hiding this comment.
Thank you for this gargantuan effort @adeliegorce!
There are a lot of good things in this PR. In saying that, I've made a bunch of comments here. These are not meant to be exhaustive: almost all of them are representative of a broader pattern that you could ask your chosen AI to go and fix.
I think the big thing to do at first is choose a consistent structure. I prefer a structure where a source module gets a test module, a source function gets a test class, and then there are very modular tests of that function inside the class.
THe other big thing is to try and modularise the tests a lot more: currently there are some absolute humungous tests which should probably be broken down into smaller specific tests.
The other hting I noticed generally is that there is a lot of code repetition for test setup (reading the same beam file many times throughout the test base for example). We could probably slim down the total run time by cleaning this up. I think a targeted request to Claude to identify as many repeated bits of code as possible should help.
| newuvp = new.get_pspec("pspecgroup", "name") | ||
| assert all(blp in vanilla_uvp.get_blpairs() for blp in newuvp.get_blpairs()) | ||
| assert len(newuvp.get_blpairs()) == len(vanilla_uvp.get_blpairs()) | ||
| def test_happy_path(vanilla_uvp: UVPSpec, single_baseline_files: list[Path]): |
There was a problem hiding this comment.
I would actually move these back into the class as they were originally, because soon we'll have #496 which will add more cli functions, and its best to keep them organized.
|
|
||
| # Check that power spectra can be overwritten | ||
| @pytest.fixture | ||
| def container_fname(tmp_path): |
There was a problem hiding this comment.
| def container_fname(tmp_path): | |
| def container_fname(tmp_path: Path) -> Path: |
|
|
||
|
|
||
| def test_combine_psc_spectra(): | ||
| def test_combine_psc_spectra(tmp_path): |
There was a problem hiding this comment.
This is maybe too much for this PR, but I would love to see it happen: instead of using files kept around in the data directory, it would be great if we could quickly make valid small UVPSpec objects on the fly to use in testing.
Probably some of the tests should be using stored files -- those that are testing reading old file formats perhaps -- but in general it's much better to create these on the fly (and it means we can see what's IN THEM by looking at the code that's generating them!).
| def uvd(): | ||
| uvdata = UVData() | ||
| uvdata.read_miriad(os.path.join(DATA_PATH, "zen.2458042.17772.xx.HH.uvXA")) | ||
| return uvdata |
There was a problem hiding this comment.
Does this really need to be a different uvd than in the other modules?
There was a problem hiding this comment.
No, the local uvd fixture cannot be replaced by any existing conftest fixture. The local fixture loads zen.2458042.17772.xx.HH.uvXA, while conftest has:
- uvd_zen_2458116 — loads zen.2458116.31939.HH.uvh5 (different file, different format)
- uvd_zen_even_xx — loads zen.even.xx.LST.1.28828.uvOCRSA (different file)
The tests using uvd (in test_calc_blpair_reds) assert specific baseline counts (e.g. len(bls1) == 12, blps == [((24, 25), (37, 38))], specific antenna indices like 24, 25, 37, 38, 39) that are tied to the exact array geometry of that file. Swapping in a different dataset would break those assertions.
There was a problem hiding this comment.
OK good to know. Even so, I would prefer to name these uvd's more descriptively so we know what on earth they are and why we're using that one not something else!
There was a problem hiding this comment.
This is an excellent question to which I have no answer :) something to keep in mind..
| class test_FTBeam(unittest.TestCase): | ||
| def setUp(self): | ||
| @pytest.fixture() | ||
| def make_ft_beam_obj(): |
There was a problem hiding this comment.
This doesn't need to be a fixture -- it can just be a function.
|
Hi! Thanks for having a look.
|
StructureConcerning the structure, you're right that big classes are not used in pytest in the same kind of way they are in python. It is not idiomatic to rely on advanced features of classes in tests. However, I like to use them as a point in the heirarchy, i.e.: That is, their only role is to group related tests. You certainly don't have to use this structure -- you could instead do something like: Either way, the point to realize is that one function in the source code (e.g. or I could do it like this: Or I could do it like this: I prefer the first way most of the time. Sometimes I differ from that, for example if a module really only has one function in it, then I'll often make my module correspond to the function and just have bare tests in there with no class. Or sometimes I have a test module that doesn't correspond to a particular source module, but rather corresponds to a kind of testing (for example in my powerbox code I have one test module dedicated to doing roundtrip testing of producing a gaussian random field then estimating the power from that GRF and comparing to the known input). There are also complications -- sometimes there is an extra bit of heirarchy in the source, i.e. a class. How do you test the ModularizationYeah I really like what you did with uvwindow. I think similar things can be done with other existing test functions (some of which I pointed out). Nevertheless, this is not absolutely required for this PR. It would be nice, but we can always iterate over time. RepetitionHaha, well it certainly did some of it. I would tell it that essentially any time it is reading a file from the DATA_PATH and creating an object from it, that should be a fixture. The only caveats are tests that are specifically testing aspects of the reading process. I would also love to check whether we really need so many different data files -- what is the essential difference between the data products that we really need to have so many different ones? It seems like we use them essentially at random much of the time. But this is a bigger question and might take more effort than it is worth in this PR? |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Co-authored-by: Steven Murray <steven.g.murray@asu.edu>
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Closes #414
This PR implements improvements to the testing infrastructure required by #414 such as
pytest.raisescallsIt is going to be a chunky one so we are starting with a draft PR.