Add with_rolled_lon to downscaling models by frodre · Pull Request #1237 · ai2cm/ace

frodre · 2026-06-06T21:42:35Z

PR 4 of 5 in the prime-meridian longitude stack (PRs 1–3 now merged to main). Lets a model re-express its grid in a seam-crossing coarse domain's longitude convention while sharing the trained network weights, so a single checkpoint can generate over a domain expressed west of 0 or east of 360.

Changes:

fme.downscaling.models.DiffusionModel.with_rolled_lon: rebuild the model through its constructor with full_fine_coords and static_inputs rolled to match the coarse grid, anchored on the western coarse-cell edge so the fine grid stays aligned to whole coarse cells; returns self when no roll is needed. Inference-only (rebuilding re-wraps the module under torch distributed).
fme.downscaling.predictors.serial_denoising.DenoisingMoEPredictor.with_rolled_lon: roll every expert (preserving the shared-grid invariant) and rebuild so the sigma dispatcher is reconstructed from the rolled experts.
fme.downscaling.data exports roll_lon_coords for the model layer.
fme.downscaling.test_models: tests for no-roll passthrough, coord shifting with shared weights (including value-level checks that coords and static data roll together, and that a double roll is a no-op), and coarse-cell alignment for a seam-crossing domain. MoE rolling tests live in test_serial_denoising next to the existing grid-validation test.
Test cleanup: shared cell_centered_coordinate helper in test_utils replaces per-file midpoint-coordinate constructions (test_models, test_config); removed a test and helper in test_models/test_serial_denoising duplicated from Validate expert grid compatibility in DenoisingMoEPredictor.__init__ #1234.
Tests added
If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated

Base: main (PRs 1–3 of the stack merged)

Stack

PR	Head → Base	Title	Status
#1234	`refactor/moe-validate-experts-init` → `main`	Validate expert grid compatibility in `DenoisingMoEPredictor.__init__`	merged
#1235	`feature/lon-roll-primitives` → `main`	Add longitude roll primitives	merged
#1236	`feature/lon-roll-data-layer` → `main`	Roll seam-crossing longitudes in the data layer	merged
#1237	`feature/lon-roll-model` → `main`	Add with_rolled_lon to models	this PR
#1238	`feature/lon-roll-integration` → PR4	Roll the model in inference/predict/evaluator	open

…1234) First in a 5-PR stack adding support for longitude domains that cross the 0/360 prime meridian in downscaling. This standalone hardening PR moves expert grid-compatibility validation into the predictor constructor so every construction path is protected, not just the config-build path: only the primary expert's coordinates are used for input prep and output coords, so an expert built on a mismatched grid would otherwise silently downscale onto the wrong grid. Changes: - `fme.downscaling.predictors.serial_denoising`: move `_validate_experts_compatible` from `DenoisingMoEConfig.build` into `DenoisingMoEPredictor.__init__`, so it holds for `build`, `from_state`, and future callers (e.g. `with_rolled_lon`). - `fme.downscaling.test_models`: add `test_denoising_moe_predictor_rejects_mismatched_expert_grids`, constructing the predictor directly with mismatched-grid experts and asserting it raises. - [x] Tests added - [ ] If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated Base: `main` ### Stack | PR | Head → Base | Title | |----|-------------|-------| | [#1234](#1234) | `refactor/moe-validate-experts-init` → `main` | Validate expert grid compatibility in `DenoisingMoEPredictor.__init__` | | [#1235](#1235) | `feature/lon-roll-primitives` → PR1 | Add longitude roll primitives | | [#1236](#1236) | `feature/lon-roll-data-layer` → PR2 | Roll seam-crossing longitudes in the data layer | | [#1237](#1237) | `feature/lon-roll-model` → PR3 | Add with_rolled_lon to models | | [#1238](#1238) | `feature/lon-roll-integration` → PR4 | Roll the model in inference/predict/evaluator |

) PR 2 of 5 in the prime-meridian longitude stack. Adds the pure coordinate/data rolling utilities needed to re-express a global grid in a seam-crossing domain's convention. These have no production callers yet — later PRs wire them into the data and model layers — so they are reviewable in isolation with full unit coverage. The interval-based roll only triggers when an interval actually crosses the seam (`start < 0` or `stop > 360`), so in-range intervals are a no-op and non-global grids are left untouched. Primitives overview (PR #1235) These primitives are always used as a pair: find_roll_anchor (or find_roll_anchor_from_interval) computes the roll amount once; callers pass it to all subsequent roll_lon_coords and roll_lon_data so coordinates and field tensors shift by the same amount. Two downstream pathways use them: - Dataset load — rolls each loaded grid into the user's configured lon_extent convention (PR #1236) - Model setup — rolls the model's fine grid to match the incoming coarse batch's convention (PR #1237) Changes: - `fme.downscaling.data.utils`: add `ClosedInterval.finite_values`, `_requires_lon_roll`, `coords_require_lon_roll`, `find_roll_anchor`, `find_roll_anchor_from_interval`, `roll_lon_coords`, `roll_lon_data`, and private helpers `_validate_rollable_lon` and `_validate_monotonic_lon`. - `roll_lon_coords` (1-D coordinate tensor) and `roll_lon_data` (N-D field tensor) form a parallel pair: both apply the same roll amount, but `roll_lon_coords` also remaps values to keep the result monotonically increasing, while `roll_lon_data` is a pure cyclic shift. Callers pre-compute the roll amount once via `find_roll_anchor` and pass it to both. - `roll_latlon_coords` is not included here; it operates on a `LatLonCoordinates` struct rather than a raw tensor and belongs in the PR that first uses it. - `fme.downscaling.data` (`__init__`): export the new roll helpers. - `fme.downscaling.data.test_utils`: unit tests for roll amounts, seam-crossing conventions, round-trip invertibility, non-global/non-uniform rejection, and invalid input validation. - [x] Tests added - [ ] If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated Base: `refactor/moe-validate-experts-init` (PR 1) ### Stack | PR | Head → Base | Title | |----|-------------|-------| | [#1234](#1234) | `refactor/moe-validate-experts-init` → `main` | Validate expert grid compatibility in `DenoisingMoEPredictor.__init__` | | [#1235](#1235) | `feature/lon-roll-primitives` → PR1 | Add longitude roll primitives | | [#1236](#1236) | `feature/lon-roll-data-layer` → PR2 | Roll seam-crossing longitudes in the data layer | | [#1237](#1237) | `feature/lon-roll-model` → PR3 | Add with_rolled_lon to models | | [#1238](#1238) | `feature/lon-roll-integration` → PR4 | Roll the model in inference/predict/evaluator |

PR 3 of 5 in the prime-meridian longitude stack. Applies the roll primitives (PR 2) in the data layer so a longitude interval that crosses the 0/360 seam can be subset instead of raising `NotImplementedError`. In-range intervals resolve to a zero roll and behave exactly as before. Changes: - `fme.downscaling.data.datasets.HorizontalSubsetDataset`: roll data and coordinates into the requested interval's convention rather than raising on wraparound. - `fme.downscaling.data.config`: extract `_build_aligned_subset_pair`, which rolls coarse and fine lon coords into the extent's convention (`_roll_lons_to_extent_convention`) before `adjust_fine_coord_range`, so fine/coarse subselection stays aligned across the seam. - `fme.downscaling.data.static.StaticInputs.roll`: roll static fields and their lon coordinates to match. - `fme.downscaling.data.test_config`, `fme.downscaling.data.test_datasets`, `fme.downscaling.data.test_static`: tests for seam-crossing subsetting (negative and >360 conventions), fine/coarse scale-factor preservation across the seam (even and odd downscale factors), end-to-end paired loader with a seam-crossing extent, and `StaticInputs.roll`. Note: surfacing the coarse grid convention on `GriddedData`/`PairedGriddedData` (`coarse_latlon_coords`) was deferred to the integration PR after review discussion. - [x] Tests added - [ ] If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated Base: `feature/lon-roll-primitives` (PR 2) ### Stack | PR | Head → Base | Title | |----|-------------|-------| | [#1234](#1234) | `refactor/moe-validate-experts-init` → `main` | Validate expert grid compatibility in `DenoisingMoEPredictor.__init__` | | [#1235](#1235) | `feature/lon-roll-primitives` → PR1 | Add longitude roll primitives | | [#1236](#1236) | `feature/lon-roll-data-layer` → PR2 | Roll seam-crossing longitudes in the data layer | | [#1237](#1237) | `feature/lon-roll-model` → PR3 | Add with_rolled_lon to models | | [#1238](#1238) | `feature/lon-roll-integration` → PR4 | Roll the model in inference/predict/evaluator |

Let models re-express their grid in a seam-crossing coarse domain's longitude convention while sharing network weights: - DiffusionModel.with_rolled_lon rebuilds the model through its constructor with full_fine_coords and static_inputs rolled to match the coarse grid. The roll is anchored on the western coarse-cell edge so the fine grid stays aligned to whole coarse cells. Returns self when no roll is needed. - DenoisingMoEPredictor.with_rolled_lon rolls every expert (preserving the shared-grid invariant) and rebuilds so the sigma dispatcher is reconstructed from the rolled experts. Adds tests for no-roll passthrough, coord shifting with shared weights, idempotency, coarse-cell alignment, and rolling all MoE experts.

frodre · 2026-06-12T23:23:34Z

+
+        Returns self unchanged when coarse_lon does not cross the prime meridian.
+
+        Intended for inference only: rebuilding wraps the module in a second


This will have to be tackled if we want to train with patches across the meridian.

Maybe just do evaluations with patches crossing the meridian first to see if this gap in training distribution is an issue or not.

…/roll-lon-model

AnnaKwa

LGTM, minor comment about checking that a rolled model is not used in training

AnnaKwa · 2026-06-15T15:52:34Z

+
+        Returns self unchanged when coarse_lon does not cross the prime meridian.
+
+        Intended for inference only: rebuilding wraps the module in a second


Maybe just do evaluations with patches crossing the meridian first to see if this gap in training distribution is an issue or not.

AnnaKwa · 2026-06-15T16:16:40Z

+
+        Intended for inference only: rebuilding wraps the module in a second
+        DistributedDataParallel under torch distributed, which is a hazard for
+        gradient-synchronized training.


Can an attribute _is_longitude_rolled be added to the model so that an assertion at training time would make the source of this error would be clear if someone tried to train with a checkpoint that had rolled coords?

frodre force-pushed the feature/lon-roll-data-layer branch from 7455589 to 806b5cd Compare June 9, 2026 21:25

Base automatically changed from feature/lon-roll-data-layer to main June 12, 2026 20:17

frodre force-pushed the feature/lon-roll-model branch from b77e2de to 39d11b0 Compare June 12, 2026 21:24

frodre added 6 commits June 12, 2026 15:21

Fix utils riname use in models, consolidate coordinate creation

5e0ca29

Fix comments

d4028ae

Remove redundant moe test from test_models and use shared coordinate

3ee4ee6

comment clean up

4326050

Add value check for static input data

5d3d75b

More documentation tweaks

d3fc740

frodre commented Jun 12, 2026

View reviewed changes

Merge branch 'main' into feature/lon-roll-model

1c5d882

frodre marked this pull request as ready for review June 12, 2026 23:38

frodre added 2 commits June 12, 2026 16:39

Comment clenaup

c060398

Merge branch 'feature/lon-roll-model' of github.com:ai2cm/ace into wt…

69f4130

…/roll-lon-model

AnnaKwa approved these changes Jun 15, 2026

View reviewed changes

Add flag for rolled model, checked at train time to ensure false

c2b6aaa

frodre enabled auto-merge (squash) June 15, 2026 18:38

frodre merged commit 6bb72fa into main Jun 15, 2026
7 checks passed

frodre deleted the feature/lon-roll-model branch June 15, 2026 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add with_rolled_lon to downscaling models#1237

Add with_rolled_lon to downscaling models#1237
frodre merged 11 commits into
mainfrom
feature/lon-roll-model

frodre commented Jun 6, 2026 •

edited

Loading

Uh oh!

frodre Jun 12, 2026

Uh oh!

AnnaKwa Jun 15, 2026

Uh oh!

AnnaKwa left a comment

Uh oh!

AnnaKwa Jun 15, 2026

Uh oh!

AnnaKwa Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		Returns self unchanged when coarse_lon does not cross the prime meridian.

		Intended for inference only: rebuilding wraps the module in a second

Uh oh!

Conversation

frodre commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Stack

Uh oh!

frodre Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

AnnaKwa Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

AnnaKwa left a comment

Choose a reason for hiding this comment

Uh oh!

AnnaKwa Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

AnnaKwa Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

frodre commented Jun 6, 2026 •

edited

Loading