Skip to content

Add test for writing and reading Legendre polynomials from disk#413

Open
samhatfield wants to merge 20 commits into
developfrom
feat/add_legpoly_read_test
Open

Add test for writing and reading Legendre polynomials from disk#413
samhatfield wants to merge 20 commits into
developfrom
feat/add_legpoly_read_test

Conversation

@samhatfield

Copy link
Copy Markdown
Collaborator

Before merging #409 I want to add a test for doing a direct transform with Legendre polynomials read from disk. Firstly I want to check whether we can call SETUP_TRANS with disk-read polynomials. This PR adds a test which calls SETUP_TRANS once to write the polynomials to disk and then again in the same instance to read the polynomials from the same file. However, the test currently segfaults which either indicates a bug in the code or improper use. @wdeconinck can you see anything wrong with how I'm calling SETUP_TRANS?

Contributor Declaration

By opening this pull request, I affirm the following:

  • All authors agree to the Contributor License Agreement.
  • The code follows the project's coding standards.
  • I have performed self-review and added comments where needed.
  • I have added or updated tests to verify that my changes are effective and functional.
  • I have run all existing tests and confirmed they pass.

@wdeconinck

wdeconinck commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

The history of this read/write was for the serial postprocessing of operational (high) resolution using following stack hierarchy:
MARS -> MIR -> Atlas -> transi -> trans
The Legendre coefficients could then be cached, or precomputed. For operational resolution these are quite big, and computation in serial is too expensive. In our parallel IFS context the recomputation is not an issue making the caching an extra complexity.
All this still comes from the time when trans was still part of the IFS and transi a standalone repository.
As such I had created a unit-test in transi to exercise this: https://github.com/ecmwf-ifs/ectrans/blob/develop/tests/transi/transi_test_io.c
It may serve as hints on how to do this just with Fortran.
Although we don't really exercise this in any Fortran context, it is still a useful Fortran test to add.
Note this is all in a serial context (NPROC==1) only.

@samhatfield

Copy link
Copy Markdown
Collaborator Author

It was just a simple precision bug -> the writer and reader assumed 8-byte reals, always, so the single-precision tests failed. I've fixed this by adjusting the type of the polynomials based on JPRB and embedding this in the legpol file metadata.

@samhatfield samhatfield requested a review from Copilot June 11, 2026 12:14
@samhatfield samhatfield marked this pull request as ready for review June 11, 2026 12:14
@samhatfield samhatfield requested a review from wdeconinck June 11, 2026 12:14
@github-actions github-actions Bot requested a review from marsdeno June 11, 2026 12:14

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an API test intended to validate that SETUP_TRANS can (1) write Legendre polynomials to disk and (2) re-initialize in the same process reading those polynomials back from the same file. To support this, it also extends the Legendre polynomial on-disk header with the real-element byte size (IRBYTES) in both CPU and GPU implementations.

Changes:

  • Add a new setup_trans API test that writes Legendre polynomials to disk and then reads them back in a second SETUP_TRANS call.
  • Extend Legendre polynomial file headers (CPU/GPU) to include IRBYTES and use it when sizing binary reads/writes.
  • Update CMake test list/excludes to register the new test and skip it for MPI>0 configurations.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/trans/api/setup_trans/setup_trans_test_suite.F90 Adds the new write/read Legendre polynomial test case.
tests/trans/api/setup_trans/CMakeLists.txt Registers the new test and excludes it for MPI configurations.
src/trans/gpu/internal/write_legpol_mod.F90 Writes extended Legendre header including IRBYTES (GPU path).
src/trans/gpu/internal/read_legpol_mod.F90 Reads extended Legendre header including IRBYTES (GPU path).
src/trans/cpu/internal/write_legpol_mod.F90 Writes extended Legendre header including IRBYTES (CPU path).
src/trans/cpu/internal/read_legpol_mod.F90 Reads extended Legendre header including IRBYTES (CPU path).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/trans/gpu/internal/write_legpol_mod.F90 Outdated
Comment thread src/trans/gpu/internal/read_legpol_mod.F90 Outdated
Comment thread src/trans/gpu/internal/read_legpol_mod.F90 Outdated
Comment thread src/trans/cpu/internal/read_legpol_mod.F90 Outdated
Comment thread tests/trans/api/setup_trans/setup_trans_test_suite.F90 Outdated
@wdeconinck

Copy link
Copy Markdown
Collaborator

Due to the extra entry in HEADER, would you say that generated files from before this PR are no longer compatible to be read in?
If no longer backwards compatible, I would add reserve some extra bytes padded into the header that could possibly be filled later.
I would also add a version number which can be checked for compatibility.
It is important to try to be backwards compatible for the future, so that caching mechanisms don't need to adapt.
It is probably OK to break backwards compatibility at this time because MARS/MIR has moved away from using this, in favour of an Atlas-serial implementation for arbitrary grids.

@samhatfield samhatfield force-pushed the feat/add_legpoly_read_test branch from 6fb498d to 6a7728e Compare June 23, 2026 15:57
@samhatfield

Copy link
Copy Markdown
Collaborator Author

Due to the extra entry in HEADER, would you say that generated files from before this PR are no longer compatible to be read in? If no longer backwards compatible, I would add reserve some extra bytes padded into the header that could possibly be filled later. I would also add a version number which can be checked for compatibility. It is important to try to be backwards compatible for the future, so that caching mechanisms don't need to adapt. It is probably OK to break backwards compatibility at this time because MARS/MIR has moved away from using this, in favour of an Atlas-serial implementation for arbitrary grids.

Correct, they're no longer backwards compatible. I've added 4 bytes to the header for storing the packed version integer, and two extra integers in case of future changes. Look good to you?

@wdeconinck

Copy link
Copy Markdown
Collaborator

Hi @samhatfield what I meant with version is the version of the file format; not necessarily the ectrans version, although that is also good to know... The file format version only gets bumped when we actually change something to the file format. We can set it to 1 now, considering 0 is the previous version.

If we're to design this really good, and we have the chance now, I suggest the header to contain this:

<"ECTRANS_LEGPOL  ":16*char> <BOM:int32> <file_format_version:int32> <ectrans_version:int32> <polynomial_type:8*char> <spectral_truncation:int32> <NGauss:int32> <precision_bytes:int32> <padding:x*bytes>
  1. 16 bytes; First a string that we can check that we're actually having the expected file format
  2. 4 bytes; The BOM is a Byte-Order-Marker to detect endianness of the written data. Typically it can be written to the file with an int32 with hexadecimal value:
    integer(int32) :: BOM
    BOM = z'12345678'
    When reading this back in and the BOM is z'78563412' then the endianness is not the native one.
  3. 4 bytes; The version of this file format. If we change anything, it has to come after only. We can set this to 1 now as in version 1.
  4. 4 bytes; Version of ectrans used (useful but not crucial)
  5. 8 bytes; polynomial type
  6. 4 bytes; spectral truncation
  7. 4 bytes; gaussian number
  8. 4 bytes; precision bytes
  9. X*bytes; padding

The ordering of 4..8 does not matter, but what's here and was already here is a good choice.
We can choose now to have the padding (X*bytes) arbitrarily large. We should not try to make the header very small as the data that comes after is quite huge. For instance with X=16, the header in total will be 64 bytes.

Finally, after all the data is written, I'd also add another string, "ECTRANS_LEGPOL_END"

Then when reading in the file back we need to add assertions:

  • file format matches "ECTRANS_LEGPOL"
  • endianness is as expected (native)
  • version is as expected

Then we can further read in the rest safely.

Finally verify that we encounter "ECTRANS_LEGPOL_END"

@samhatfield

Copy link
Copy Markdown
Collaborator Author

All good ideas. Let me take a look.

@samhatfield samhatfield force-pushed the feat/add_legpoly_read_test branch from ad52119 to 92581d1 Compare June 24, 2026 15:06
@samhatfield

samhatfield commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator Author

@wdeconinck the header now looks like the following:

! Layout:
! 1. 20 bytes: the string "ECTRANS_LEGPOL_START" indicating the start of the header
! 2. 4 bytes: byte-order-marker indicating endianness of this platform
! 3. 4 bytes: version of the polynomial file
! 4. 4 bytes: version of ecTrans as packed integer
! 5. 8 bytes: polynomial type (one of the strings "LEGPOLBF" or "LEGPOL  ")
! 6. 4 bytes: spectral truncation
! 7. 4 bytes: number of northern latitudes
! 8. 4 bytes: size of real numbers in bytes
! 9. 32 bytes: padding reserved in case of future use
! 10. 20 bytes: the string "ECTRANS_LEGPOL_FINAL"
! Total: 104 bytes

I'm not that familiar with BOZ literals in Fortran so if you could take a look at how I've handled the BOM and whether I've done it correctly, that would be appreciated...

@wdeconinck

Copy link
Copy Markdown
Collaborator

It's nice to have a HEADER_END marker; it does not necessarily have to be a large string.
I would use ECTRANS_LEGPOL_FINAL at the very end of the file (after writing all the actual data); This can be used to check if a file was incomplete.
What would also be nice, if possible to add to the header is the number of bytes in the data section, following the HEADER_END section all the way upto (not including) ECTRANS_LEGPOL_FINAL. So some kind of precomputation of expected bytes...
That could make it possible to read all the data into memory in advance, and also allow to file-jump to the ECTRANS_LEGPOL_FINAL string to verify the file is complete.

@samhatfield

Copy link
Copy Markdown
Collaborator Author

It's nice to have a HEADER_END marker; it does not necessarily have to be a large string. I would use ECTRANS_LEGPOL_FINAL at the very end of the file (after writing all the actual data); This can be used to check if a file was incomplete.

There's already an end marker 'LEGPOL---EOF-EOF' which is checked in READ_LEGPOL.

What would also be nice, if possible to add to the header is the number of bytes in the data section, following the HEADER_END section all the way upto (not including) ECTRANS_LEGPOL_FINAL. So some kind of precomputation of expected bytes... That could make it possible to read all the data into memory in advance, and also allow to file-jump to the ECTRANS_LEGPOL_FINAL string to verify the file is complete.

I've added this functionality to WRITE_LEGPOL, but it isn't currently used in READ_LEGPOL. I'm not sure how to skip ahead by a prescribed number of bytes like you suggest. Does BYTES_IO_WRITE increment an offset from the start of the file? Do I just need to read a dummy buffer of NBYTES and then check that the next n bytes matches the expected end marker ('LEGPOL---EOF-EOF')? I suppose I would then need to rewind to finish reading the header.

@wdeconinck

wdeconinck commented Jun 25, 2026 via email

Copy link
Copy Markdown
Collaborator

@samhatfield

Copy link
Copy Markdown
Collaborator Author

I checked whether the precomputed file size matches the actual size for the test, and it does:

Single precision
Expected file size = 112 (header) + 1063040 (body) + 16 (end marker) = 1063168 bytes. Matches actually-written polynomial file.

Double precision
Expected file size = 112 (header) + 2125440 (body) + 16 (end marker) = 2125568 bytes. Matches actually-written polynomial file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants