Add CUDA ring FFT backend for SHT by ssmmnn11 · Pull Request #1165 · ecmwf/anemoi-core

ssmmnn11 · 2026-06-03T10:35:41Z

cufft

📚 Documentation preview 📚: https://anemoi-training--1165.org.readthedocs.build/en/1165/

📚 Documentation preview 📚: https://anemoi-graphs--1165.org.readthedocs.build/en/1165/

📚 Documentation preview 📚: https://anemoi-models--1165.org.readthedocs.build/en/1165/

samhatfield · 2026-06-10T17:28:29Z

@ssmmnn11 any tips for the JIT compilation? I tried using SphericalHarmonicTransform with use_cuda_ring_fft=True but I get a gargantuan quantity of compilation errors, e.g.

RuntimeError: Error building extension 'anemoi_ring_fft': [1/3] nvc++ -MMD -MF ring_fft.o.d -DTORCH_EXTENSION_NAME=anemoi_ring_fft -DTORCH_API_INCLUDE_EXTENSION_H -I/path/to/nvidia/25.3/Linux_aarch64/25.3/math_libs/include -isystem /path/to/python_envs/ag/lib/python3.12/site-packages/torch/include -isystem /path/to/python_envs/ag/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /path/to/nvidia/25.3/Linux_aarch64/25.3/compilers/include -isystem /path/to/python3/3.12.11-01/include/python3.12 -fPIC -std=c++17 -O3 -DANEMOI_RING_FFT_ENABLE_CUFFT -c /path/to/anemoi-core/models/src/anemoi/models/layers/cuda/ring_fft.cpp -o ring_fft.o
FAILED: [code=2] ring_fft.o
nvc++ -MMD -MF ring_fft.o.d -DTORCH_EXTENSION_NAME=anemoi_ring_fft -DTORCH_API_INCLUDE_EXTENSION_H -I/path/to/nvidia/25.3/Linux_aarch64/25.3/math_libs/include -isystem /path/to/python_envs/ag/lib/python3.12/site-packages/torch/include -isystem /path/to/python_envs/ag/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /path/to/nvidia/25.3/Linux_aarch64/25.3/compilers/include -isystem /path/to/python3/3.12.11-01/include/python3.12 -fPIC -std=c++17 -O3 -DANEMOI_RING_FFT_ENABLE_CUFFT -c /path/to/anemoi-core/models/src/anemoi/models/layers/cuda/ring_fft.cpp -o ring_fft.o
"/path/to/python_envs/ag/lib/python3.12/site-packages/torch/include/torch/headeronly/util/Half.h", line 85: error: identifier "float16_t" is undefined                                                                                        
    inline Half(float16_t value);                                                                                                                                                                                                                            
                ^                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                             
"/path/to/python_envs/ag/lib/python3.12/site-packages/torch/include/torch/headeronly/util/Half.h", line 86: error: expected an operator                                                                                                       
    inline operator float16_t() const;                                                                                                                                                                                                                       
                    ^                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                             
"/path/to/python_envs/ag/lib/python3.12/site-packages/torch/include/torch/headeronly/util/Half.h", line 103: error: no suitable conversion function from "const c10::Half" to "float" exists                                                  
    out << (float)value;

samhatfield · 2026-06-26T14:54:08Z

I've rebased the main commit of this branch on top of main to simplify the history. At the same time I removed some functionality temporarily just to make my life a bit easier in reviewing. The CUDA backend is now off by default and must be activated by explicitly passing use_cuda_ring_fft=True. This is currently overridden by use_graphed_rfft=True but we can discuss later the priority of these two options, including how to allow the user to control them both through env vars. I've also removed the "grouped" transform option as that was just an idea, not necessarily relevant anymore now that we have better options.

This currently fails, indicating an issue with the CUDA backend.

github-project-automation Bot added this to Anemoi-dev Jun 3, 2026

github-project-automation Bot moved this to To be triaged in Anemoi-dev Jun 3, 2026

ssmmnn11 assigned samhatfield Jun 3, 2026

github-actions Bot added training models labels Jun 3, 2026

ssmmnn11 marked this pull request as draft June 3, 2026 10:36

samhatfield reviewed Jun 4, 2026

View reviewed changes

Comment thread training/docs/modules/losses.rst Outdated

Base automatically changed from feat/graphs_sht_make_callable to main June 8, 2026 13:54