Skip to content

Sparse pullback for big performance gain#2170

Open
unalmis wants to merge 89 commits into
masterfrom
ku/sparse_pullback
Open

Sparse pullback for big performance gain#2170
unalmis wants to merge 89 commits into
masterfrom
ku/sparse_pullback

Conversation

@unalmis

@unalmis unalmis commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator

PR Age

  • Resolves Sparsity preserving pullbacks #2168 .
    • Yields factor of $N \times$ improvement in memory and compute speed to get Jacobian $R^{N \times M}$ of map $f \colon R^M \to R^N$. Basically if you had some expensive function, with proper application throughout you could get Jacobian for around the cost you'd get a single row before.
    • $N^2$ for check pointed funs
    • Even if you are only doing a single vjp, it is much faster and more memory effecient due to reasons discussed in links threads.
    • could do sparse lin alg on the cotangent update too, but i didn't make assumption that cotangent is sparse
    • Functions added are sparse_pullback and sparse_pullback_map
  • Removes
    • bounce1d optimization.
    • Unneeded flags (is_reshaped, is_fourier) that users said were confusing (backwards compatible) as well as the developer flags Bref, Lref that should not be there.
  • Previously, pitch_batch_size was getting ignored. This fixes that by adding strip_dim0 flag to batch_map.
  • Switch resolution to per field period to simplify use and analysis #2182
  • Resolves the fixme comment so that gradients are consistent #2185

notes

  • They are decorator functions, so it is non-invasive and can improves many objective such as bounce, balloon, qs, omni, surface integral, free surface etc, but more generally can apply to any atomic computation. Trivial to use now that I have done all the math and resolved implementation sharp bits.
  • for example the 4d singular integral kernel can be reduced to 2d pullback
  • For this PR, I only apply the decorator at one point in bounce. Because itsdone at intermediate point I needed to change some code. Note the CI benchmarks won't show that improvement since those objectives were onto $R^1$, but benchmarking with tensor-board indeed shows the factor of $N$ improvement in speed and memory computing the Jacobian. And this is not yet applied to the full pipeline. See Single contangent pullback through compute pipeline #2171 .
  • Unrelated to this PR since it's always been like this, but computing the proximal projection Jacobian with Force balance constraint is more expensive that computing the bigger unconstrained Jacobian since the svd solve is way more expensive now than computing a vjp of bounce integrals. Perhaps the decorator could be applied to stuff there; I didn't look.
  • If the partial summation in Poloidal FFT Implementation #1508 is done so that the the transforms are factorized then we can extend the pullback wrappers to include the spectral to real space transform on each surface to by decorating the larger function with a sparse_pullback as well. Or use Fourier-Chebysev as i had suggested in Making transforms more efficient #1243 to make it simpler. This would have thr advantage of further reducing memory.
  • This should renew interest in Upsample data above midplane to full grid assuming stellarator symmetry #1206 and Compute function and gradient simultaneously for reverse mode #1872 and Generalize toroidal angle beyond phi cylindrical #465.

@unalmis unalmis requested review from ddudt, dpanici, f0uriest and rahulgaur104 and removed request for ddudt, dpanici, f0uriest and rahulgaur104 May 27, 2026 23:06
Comment thread desc/integrals/_bounce_utils.py
Comment thread desc/integrals/_bounce_utils.py
Comment thread desc/integrals/bounce_integral.py
Comment thread desc/integrals/bounce_integral.py
Comment thread desc/integrals/bounce_integral.py
Nemov=True,
**kwargs,
):
errorif(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other issue here is that using vjp prevents using standard jax.hessian which is forward over reverse. If reverse is really always significantly faster than that may be ok, but it's worth getting other folks input.

If we do end up only supporting reverse mode it would be good to add to the Notes section of the docstring to remind users (and maybe eventually having a reverse only version of objective.hess)

Comment thread desc/objectives/_neoclassical.py
Comment thread desc/objectives/_neoclassical.py Outdated
Nemov=True,
**kwargs,
):
errorif(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like in theory forward mode could still be possible by passing sparse=False when computing the bounce integrals? (assuming #1489 is fixed)

Comment thread CHANGELOG.md Outdated
@unalmis unalmis requested review from ddudt, dpanici, f0uriest and rahulgaur104 and removed request for ddudt, dpanici, f0uriest and rahulgaur104 June 8, 2026 21:53

@ddudt ddudt left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed all the code yet, but I think this introduces some breaking changes. I'm not sure what the best solution is: ideally we would keep support for forward mode, but we at least need to add more error handling.

Nemov=True,
**kwargs,
):
errorif(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought(blocking)
Forward mode is actually very common. Individual objectives often prefer reverse mode, but typically we optimize with a list of many different objectives. If you set ObjectiveFunction(objectives, deriv_mode="batched") then it will use deriv_mode="fwd" for all of the sub-objectives. From the docs:

deriv_mode : {"auto", "batched", "blocked"}
Method for computing Jacobian matrices. batched uses forward mode, applied
to the entire objective at once, and is generally the fastest for vector
valued objectives. ...

So unfortunately I think we do need to make this PR contingent on a solution to using forward mode, because otherwise these changes will break user code. Also I think we have been very careful to always support both forward and reverse mode for all objectives as a design principal, so at the very least we should discuss the consequences in more detail before we abandon it.

normalize_target=True,
loss_function=None,
deriv_mode="auto",
jac_chunk_size=None,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion
Even if jac_chunk_size isn't used in the code anymore, I think we should still throw a deprecation warning. If users had been passing this argument before, we want to warn them that they can no longer do what they had been doing before.

@unalmis

unalmis commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator Author

@unalmis unalmis mentioned this pull request Jun 15, 2026
@unalmis unalmis requested a review from ddudt June 17, 2026 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AD related to automatic differentation P∞ P_infty. Ready to merge > 1 years. priority to merge to prevent further delay of research. performance New feature or request to make the code faster stable Besides merging master, other updates require a child PR that should be merged to master later. theory Requires theory work before coding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sparsity preserving pullbacks

5 participants