Releases: ai2cm/ace
2026.5.1
Note that for PyPI versioning consistency reasons this release includes and supersedes the 2026.5.0 release.
What's Changed
Fine-Tuning & Checkpoint Resume
New config options make it easier to resume or fine-tune from existing checkpoints:
OptimizationConfig.resume_optimizer_ckpt_path: restore optimizer state when fine-tuning (#1043)EMAConfig.resume_ema_ckpt_path: resume from an EMA checkpoint (#1118)CheckpointStepperConfig: load stepper config directly from a checkpoint (#1103)- Optimizer/EMA state is now included in epoch checkpoints (
ckpt_{epoch:04d}.tar) (#1104)
Ensemble Inference
- Initial ensemble (IC ensemble) support added to the evaluator and inference aggregators (#709)
New Models & Architecture
filter_preserves_global_meanoption added to SFNO (#1100)SecondaryModuleStepConfig/SecondaryModuleStep: compose a secondary module during training steps (#1073)
Coupled Model
- Stochastic
CoupledSteppertraining (#750) - Randomly sampled
LossContributions.n_steps(#869) optimize_last_step_onlyadded to coupledLossContributionsConfig(#868)
Diagnostics
- Power spectrum diagnostics logged in the inference entrypoint (#1078, #1079)
- Weather eval entrypoint replaced with a more general
additional_inferencelist (#1096)
Data Processing
- Time subsetting can now be configured prior to time coarsening (#1055)
PRMSLadded to X-SHiELD data processing configurations (#1036)
Bug Fixes
- Clamped SSR calculation that was producing NaNs silently dropped from W&B (#1088)
- Worked around xarray
StringDTypeserialization error (#1086) - Signal handler now exits with a nonzero code (#1068)
IceCorrectorConfigcorrectly registered inCorrectorSelectorregistry (#1044)
Breaking Changes
TrainStepperConfig.train_n_forward_stepsrenamed toTrainStepperConfig.n_forward_steps— all train YAML configs must update this field (#1052)TrainConfig.n_forward_stepsremoved (was deprecated; usestepper_training.n_forward_steps) (#1052)TrainConfig.weather_evaluation: WeatherEvaluationConfig | Nonereplaced byTrainConfig.additional_inference: list[AdditionalInferenceConfig](#1096)- Sub-aggregator
record_batch(time, data)interface replaced byrecord_batch(data: InferenceBatchData)(#1097) StepLoss.forward()now returnsLossOutputinstead oftorch.Tensor; call.total()for the scalar (#1020)fme.diffusionpackage removed (#1084)
Full Changelog: v2026.4.0...v2026.5.1
v2026.5.0
What's Changed
- Allow stochastic
CoupledSteppertraining by @jpdunc23 in #750 - Ensure
IceCorrectorConfigis added toCorrectorSelectorregistry by @William-gregory in #1044 - Add link to colab notebook example by @oliverwm1 in #1047
- Add
optimize_last_step_onlyto coupledLossContributionsConfigby @jpdunc23 in #868 - Update dataset path for era5 data processing config by @mcgibbon in #1051
- Enable configuring time subsetting prior to time coarsening by @spencerkclark in #1055
- Add script to copy gcs data to weka by @mcgibbon in #1058
- Remove deprecated n_forward_steps from TrainConfig by @mcgibbon in #1052
- Downscaling loss weighting by @AnnaKwa in #1056
- Refactor
DiffusionModel.generateby @AnnaKwa in #1060 - Set explicit 30-minute timeout on all init_process_group calls by @mcgibbon in #1069
- Allow Distributed.get_instance() without context for single-rank by @mcgibbon in #1070
- Add Claude Code transcript logging as pr comment by @mcgibbon in #1072
- Fork DISCO convolution with FFT-based contraction into fme/core/disco by @mcgibbon in #1066
- Add new
test_ice_trainscript to catch bugs which affect GraphCast andIceCorrectorby @William-gregory in #1054 - Extract TensorDictAccumulator primitive by @mcgibbon in #1074
- Remove barrier from ModelTorchDistributed.shutdown() by @mcgibbon in #1067
- Randomly sampled coupled
LossContributions.n_stepsby @jpdunc23 in #869 - Exit with nonzero code in signal handler by @mcgibbon in #1068
- Set seed in
fme.coupledtraining integration test by @jpdunc23 in #1075 - Updates repo name in perlmutter example make-venv by @yikwill in #1063
- Update baseline Beaker budget to "atec-climate" by @brianhenn in #1080
- Log power spectrum diagnostics in inference entrypoint by @spencerkclark in #1078
- from_state methods give on-device, memory-decoupled objects by @mcgibbon in #1061
- Add SecondaryModuleStepConfig and SecondaryModuleStep by @mcgibbon in #1073
- Remove unused fme/diffusion folder by @oliverwm1 in #1084
- Implement
get_datasetfor power spectrum aggregators by @spencerkclark in #1079 - Have losses return non-reduced tensors by @Arcomano1234 in #1020
- Work around xarray
StringDTypeserialization error by @spencerkclark in #1086 - Fix/n ensemble attributes batch data by @Arcomano1234 in #1085
DenoisingMoEPredictorby @AnnaKwa in #1071- Decouple noise floor argo workflow from
fmeby @spencerkclark in #1083 - Add bottleneck_attention option to diffusion model config by @AnnaKwa in #1094
- Add
PRMSLto X-SHiELD data processing configurations by @spencerkclark in #1036 - Add IC ensemble ability to evaluator and update inference aggregators for ensembles by @Arcomano1234 in #709
- Replace weather eval with additional inference by @mcgibbon in #1096
- Add isotropic Morlet filter basis for DISCO by @mcgibbon in #1093
- Enable
DenoisingMoEPredictorin inference by @AnnaKwa in #1059 - Add script to extract
Steppercheckpoints from anCoupledSteppercheckpoint by @jpdunc23 in #1105 - Fix latents, inputs order in mixture of experts
generateby @AnnaKwa in #1106 - Remove
scripts/test_distributed_context.pygiven #1070 by @jpdunc23 in #1107 - Uniform sub-aggregator interface via InferenceBatchData by @mcgibbon in #1097
- Add filter_preserves_global_mean option to SFNO by @mcgibbon in #1100
- Clamp SSR calculation that was leading to NaNs and silently dropped from Wandb by @Arcomano1234 in #1088
- Add CheckpointStepperConfig to load stepper config from checkpoint by @mcgibbon in #1103
- Thin coordinator: move builder logic to config.build() by @mcgibbon in #1098
New Contributors
- @William-gregory made their first contribution in #1044
- @yikwill made their first contribution in #1063
Full Changelog: v2026.4.0...v2026.5.0
v2026.4.0
Release date: April 9, 2026
What's Changed
A subset of changes are listed here, see full changelog for more detail: v2026.1.1...v2026.4.0
⚠️ Breaking Changes
fme.aceandfme.coupledtraining configs: Training-only fields (loss,optimize_last_step_only,n_ensemble,parameter_init,train_n_forward_steps) have been removed fromStepperConfigand must now be set under a new top-levelstepper_training: TrainStepperConfigfield. Existing training configs will need to be updated. (#862)
New Config Options
metrics_log_dironLoggingConfig: Log W&B scalar metrics to a local JSONL file on disk in addition to W&B. (#992)- Configurable inference step logging: Control which inference steps are logged to W&B. (#883)
ValidationConfigonInferenceEvaluatorConfig(fme.ace): Optionally run a validation pass before inference and log metrics to step 0 of the W&B run. (#878)LRTuningConfigonTrainConfig(fme.ace,fme.coupled,fme.diffusion): Automatically tune the learning rate at configurable epochs by running short isolated comparison trials between the current and a candidate LR — no restarts required. (#930)prescribed_prognostic_namesonSingleModuleStepperConfig: Override named prognostic variables with ground-truth values at each inference/eval timestep. Intended to be set via stepper_override in eval configs. (#810)- Optional left/two-tailed PDF metrics for downscaling training. (#994)
LossVsNoiseAggregatorfor downscaling: Tracks loss as a function of noise level during diffusion training. (#1025)- Configurable training noise distribution. (#874)
Deprecations
sea_ice_thickness_nameonSeaIceFractionConfig(ocean corrector): Deprecated in favor of the more generalzero_where_ice_free_nameslist, which supports correcting multiple outputs. (#843)CascadePredictor(downscaling): Deprecated and removed. (#970)- Topography pathway on downscaling
DataLoaderConfig/PairedDataLoaderConfig: Deprecated; useStaticInputsinstead. (#926)
Notable Behavioral Fixes
HiRO-ACE Release
This release is the official milestone for our team's change to fully open development, and includes the latest updates for our HiRO-ACE as described in our paper. HiRO-ACE is a two-stage emulation framework for generating 3 km resolution precipitation outputs using a stochastic climate emulator (ACE2S) to generate 100km climate simulations and a downscaling model (HiRO) to generate 3 km precipitation outputs.
See the docs for a quickstart on installation and use, and our huggingface repo for the models and some sample data to run on.
Open Development
Previously, the ACE repo solely held updates related to papers and releases, while most of the development happened behind the scenes in a separate repository. This made it harder for external collaborators to contribute and for users to track development progress. We hope the changeover to all of our development happening here brings us closer to users, facilitating easier paths for soliciting feedback, issues, and development from outside of our group.
Updates
- Don't upload big maps by @AnnaKwa in #707
- Add
StaticInputsclass by @AnnaKwa in #713 - Add hiro ckpt train config by @AnnaKwa in #721
- Provide backwards compatibility for list-type BatchLabels by @mcgibbon in #722
- Serialize static inputs with downscaling model by @AnnaKwa in #727
- Beaker CI test via gantry by @brianhenn in #723
- Remove filter repo tools by @brianhenn in #729
- Add training configs for ACE2S used in HiRO - ACE manuscript by @Arcomano1234 in #710
- Pass model static inputs to dataset build calls at generation by @AnnaKwa in #728
- Call optimizer autocast in stepper predict generator by @mcgibbon in #733
- Ensure topography is on device in downscaling inference by @AnnaKwa in #731
- Coupled stepper config removes deprecated
crps_trainingkey by @elynnwu in #734 - Samudra bugfix: Use circular padding for longitude axis by @elynnwu in #735
- Add additional diagnostics of the OHC budget by @jpdunc23 in #737
- Prevent backpropagation anomalies in energy corrector by @spencerkclark in #724
- Fix bug causing step sampler to be ignored by @mcgibbon in #742
- Add a contributing guideline by @oliverwm1 in #730
- Increase timeout of NCCL collective operations to 20 minutes by @jpdunc23 in #746
- Add docs page for downscaling inference by @AnnaKwa in #743
- Vendorize Apache 2.0 Nvidia Downscaling Code by @frodre in #748
- Enforce lat bounds (-88 deg, 88 deg) by @AnnaKwa in #740
- Bump version v2026.1.1 for HiRO-ACE release by @frodre in #751
Full Changelog: v2026.1.0...v2026.1.1
v2026.1.0
Release marking the switch to open development for the ai2cm team.
What's Changed
- Docs CI update by @brianhenn in #711
- Bump version to 2026.1.0 by @brianhenn in #715
Full Changelog: https://github.com/ai2cm/ace/commits/v2026.1.0
2025.11.0
Release date: November 7, 2025
Full Changelog: 2025.10.0...2025.11.0
What's Changed
We updated the versions of fme dependencies torch-harmonics (0.7.4 --> 0.8.0) and imageio(<2.27.0 --> >2.28.1) based on user feedback.
2025.10.0
Release date: October 16, 2025
Full Changelog: 2025.7.0...2025.10.0
What's Changed
This release includes the capability to run coupled models (such as those emulating the atmosphere, ocean, and sea ice!) via entrypoints in fme.coupled. We have provided documentation for running inference using coupled model weights.
The deprecated legacy training configuration format (SingleModuleStepperConfig) has been removed in this release. However, breaking changes have been avoided and backwards compatibility has been maintained with existing saved models for most cases.
2025.7.0
What's Changed
This release includes major internal refactors and improved documentation. The previous training configuration format has been deprecated and will be removed in a future release. However, breaking changes have been avoided and backwards compatibility has been maintained with existing saved models for most cases.
Version updates:
- Python 3.11 and torch 2.7.1
Internal refactors:
- The
fmepackage has been moved one level up (i.e., away from the legacyfme/fme/...layout and tofme/ace/andfme/core/instead).
Increased modularity for ML emulation:
- Training configuration is now based around a more flexible
StepperConfig; the legacySingleModuleStepperConfigis deprecated and will be removed in a future release. - The stepper config now supports the modular
stepframework allowing composible steps for ML emulation.
Experimental features:
- Samudra, a global ocean emulator developed by M2LInES, is now fully integrated into Ai2's full model framework. An example production workflow for training and running Samudra is currently under development and will be included in the upcoming release.
Documentation
- Added an improved
quickstart.rstfocused around the models saved in our Hugging Face collection.
Full Changelog: 2024.12.0...2025.7.0
2024.12.0
What's Changed
This release contains many internal changes for ACE code. However, all configuration options accessible by the entrypoints of the fme package (i.e. fme.ace.train, fme.ace.inference and fme.ace.evaluator) have had no breaking changes.
The following lists are not complete but just a highlight of changes which may be relevant to users.
Bug fixes:
- resolved transient bug that sometimes occurred in
XarrayDatasetwhen trying to read the image shape from a scalar field - when using
n_repeatsgreater than 1,XarrayDatasetnow correctly increments the values in the returnedtimearrays
New features:
- ACE works on Apple Silicon! Set the environmental variable
FME_USE_MPS=1to use the pytorch MPS backend. Make sure to have the latest version of pytorch installed. This gives about a 5x speed up over running on CPU (tested on a Macbook Pro M3 Max). - add perturbations to sea surface temperature during inference (see
ForcingDataLoaderConfig.perturbations)
Refactors:
- deduplicated some inference code by using generics. Now the
fme.ace.inferenceandfme.ace.evaluatorentrypoints now share more code.
Full Changelog: 2024.9.0...2024.12.0
2024.9.0
What's Changed
- Update README to link to zenodo repo with checkpoint by @oliverwm1 in #3
- New public release of FME code by @oliverwm1 in #5
- Fix instruction for installing from GitHub by @oliverwm1 in #7
- Add readthedocs config by @mcgibbon in #6
- Add docs badge and link by @oliverwm1 in #8
- Add link to zenodo archive with checkpoint by @oliverwm1 in #9
- Add link to E3SMv2-trained paper and checkpoint by @oliverwm1 in #12
- Add link to published EAMv2 paper in JGR-ML by @jpdunc23 in #16
- Add missing init files by @oliverwm1 in #17
- Update for PyPI release by @frodre in #20
New Contributors
- @oliverwm1 made their first contribution in #3
- @mcgibbon made their first contribution in #6
- @jpdunc23 made their first contribution in #16
- @frodre made their first contribution in #20
Full Changelog: 2023.12.0...2024.9.0