Cosmo3DFlow: Wavelet Flow Matching for Spatial-to-Spectral Compression in Reconstructing the Early Universe
KDD '26 · ACM SIGKDD · August 9–13, 2026 · Jeju, Republic of Korea
@article{islam2026cosmo3dflow,
title = {Cosmo3DFlow: Wavelet Flow Matching for Spatial-to-Spectral
Compression in Reconstructing the Early Universe},
author = {Islam, Md Khairul and Xia, Zeyu and Goudjil, Ryan and
Wang, Jialu and Farahi, Arya and Fox, Judy},
journal = {arXiv preprint arXiv:2602.10172},
year = {2026}
}- Cosmo3DFlow: Wavelet Flow Matching for Spatial-to-Spectral Compression in Reconstructing the Early Universe
Cosmo3DFlow reconstructs early-Universe initial conditions from present-day observations using 3D Wavelet Flow Matching — operating entirely in wavelet space for a 50× speedup over diffusion baselines.
- 50× faster sampling than score-based diffusion (5.2 s vs. 243 s at 128³)
- 8× spatial compression via single-level 3D Haar DWT
- 10× fewer ODE steps · 2× less memory · better reconstruction quality
~63.7% of cosmic volume is empty voids holding only 16.2% of dark matter mass — yet voxel-space models spend equal compute everywhere. The 3D DWT converts spatial emptiness into spectral sparsity, concentrating compute on physically meaningful filaments and halos.
Fig. 1 — Voxel vs. wavelet representation of the cosmic web. Left: a voxel grid distributes compute uniformly across all 2.1 M cells at 128³, despite ~63.7% being near-empty cosmic voids. Right: a single-level 3D Haar DWT makes sparsity explicit — voids collapse to near-zero high-frequency coefficients, while filaments and dark matter halos retain rich fine-grained detail. This 8× spatial compression is the foundation of Cosmo3DFlow's efficiency gains.
Flow matching trained entirely in wavelet space: apply 3D Haar DWT → interpolate the flow path → train with a flow + power-spectrum loss → integrate 100 Euler steps → IDWT to recover the density field.
Fig. 2 — Wavelet-aware 3D U-Net. A 16-channel input (8ch wavelet noise + 8ch conditioned observation) passes through encoder–decoder blocks with a fixed 8³ bottleneck. Scale-specific conditioning injects per-level wavelet features at each resolution via 1×1×1 convolutions. Cross-scale skip connections bridge encoder features to non-corresponding decoder levels, enabling multi-scale information flow beyond a standard U-Net.
- Scale-specific conditioning — per-level wavelet features injected via 1×1×1 convolutions
- Cross-scale skip connections — encoder features bridged to non-corresponding decoder levels
- BigGAN residual blocks — GroupNorm · SiLU · Gaussian Fourier time embeddings
- Fixed 8³ bottleneck — 2 / 3 / 4 encoder levels for 32³ / 64³ / 128³
Optimizer: AdamW lr=1e-4 | Grad clip: 1.0 | EMA: 0.999
Schedule: ReduceLROnPlateau (patience=5, factor=0.5)
Epochs: 100 (best val-loss) | Batch: 16 / 8 / 4 (32³/64³/128³)
Hardware: NVIDIA A100 80 GB
Three Quijote N-body suites · (1000 h⁻¹ Mpc)³ boxes · 512³ particles · fields at 32³, 64³, 128³
| Suite | Simulations | Split |
|---|---|---|
| Standard Latin Hypercube (LH) | 2,000 | 1800 / 100 / 100 |
| Big Sobol Sequence (BSQ) | 1,000 | 8:1:1 |
| Non-Gaussian fNL LH | 1,000 | 8:1:1 |
Fig. 3 — Qualitative reconstruction at z = 127. Each row shows a 2D slice from a held-out Standard LH test simulation. Columns (left to right): present-day observation (z = 0), ground-truth initial conditions, diffusion baseline, Cosmo3DFlow reconstruction, and absolute error maps (darker = lower error). Cosmo3DFlow recovers sharp cosmic filaments and halo positions that the diffusion baseline blurs, achieving a 21% lower VRMSE at 128³.
Fig. 4 — Sampling efficiency vs. reconstruction accuracy at 128³. Each point plots VRMSE against wall-clock sampling time for varying ODE step counts. Cosmo3DFlow (blue) is 4.4× faster per step due to 8× wavelet compression, and converges to lower VRMSE at just 100 steps than diffusion achieves at 1,000 — yielding a 50× end-to-end speedup (5.2 s vs. 243 s) with better quality.
Table 1 — Head-to-head comparison at 128³
| Cosmo3DFlow | Diffusion | |
|---|---|---|
| Sampling time @ 128³ | 5.2 s | 243 s |
| Peak memory @ 128³ | 2.1 GB | 4.0 GB |
| ODE steps | 100 | 1,000 |
Fig. 5 — Convergence vs. number of ODE integration steps. Top: reconstructed density field slices at 10, 50, 100, and 500 steps. Bottom: VRMSE as a function of step count (lower = better). Cosmo3DFlow (blue) reaches its best reconstruction quality at 100 Euler steps and plateaus; the diffusion baseline (red) requires 1,000 steps to approach a higher error floor. The deterministic ODE trajectory in flat wavelet space enables stable large-step integration without quality degradation.
Fig. 6 — Statistical physics metrics on the Standard LH test set at 128³. Each panel plots a statistic vs. wavenumber k. Top: power spectrum P(k) — energy distribution across spatial scales; middle: cross-correlation C(k) between predicted and true density fields; bottom: transfer function T(k). Cosmo3DFlow (blue) achieves near-perfect agreement with ground truth (dashed) across all scales, while diffusion (red) degrades at high k. PS R² = 0.99 vs. 0.70 for diffusion.
Tables 2–4: All datasets × resolutions (Ours / Diffusion · bold = best)
| Resolution | VRMSE ↓ | Corr ↑ | PS R² ↑ | Transfer Fn ↑ |
|---|---|---|---|---|
| 128³ | 0.50 / 0.63 | 0.88 / 0.82 | 0.99 / 0.70 | 0.99 / 0.80 |
| 64³ | 0.47 / 0.68 | 0.92 / 0.89 | 0.98 / 0.59 | 0.98 / 0.59 |
| 32³ | 0.34 / 0.82 | 0.96 / 0.85 | 0.95 / 0.48 | 0.95 / 0.48 |
| Resolution | VRMSE ↓ | Corr ↑ | PS R² ↑ | Transfer Fn ↑ |
|---|---|---|---|---|
| 128³ | 0.62 / 0.64 | 0.80 / 0.79 | 0.99 / 0.84 | 0.95 / 0.88 |
| 64³ | 0.53 / 0.65 | 0.88 / 0.88 | 0.98 / 0.83 | 0.94 / 0.81 |
| 32³ | 0.37 / 0.79 | 0.95 / 0.85 | 0.95 / 0.48 | 0.94 / 0.71 |
| Resolution | VRMSE ↓ | Corr ↑ | PS R² ↑ | Transfer Fn ↑ |
|---|---|---|---|---|
| 128³ | 0.56 / 0.59 | 0.86 / 0.83 | 1.00 / 1.00 | 0.98 / 0.98 |
| 64³ | 0.47 / 0.57 | 0.93 / 0.89 | 1.00 / 1.00 | 0.99 / 0.99 |
| 32³ | 0.31 / 0.67 | 0.97 / 0.87 | 1.00 / 0.98 | 0.99 / 0.98 |
git clone https://github.com/khairul-me/Cosmo3DFlow.git
cd Cosmo3DFlow
pip install -r requirements.txtWe acknowledge support from the National Science Foundation under Cooperative Agreement 2421782 and the Simons Foundation award MPS-AI-00010515 and Seed Grant AWD-006703 (UVA00002858-AS-ASTR-NSF Simons CosmicAI). We thank the Quijote team for making their 𝑁 -body suite publicly available. We are grateful for the UVA Research Computing resources and support.
University of Virginia · University of Texas at Austin


