ideal_mhd_model: export SIMSOPT-QS quantities via a flat-buffer Enzyme-ready path by krystophny · Pull Request #584 · proximafusion/vmecpp

krystophny · 2026-06-14T20:50:00Z

Stacks on #582 (exact autodiff HVP). Merge order: after #582.

Exports every quantity SIMSOPT's QuasisymmetryRatioResidual reads, computed from
the equilibrium state through a flat-buffer, allocation-free path instead of the
Eigen RowMatrixXd transform in output_quantities (which Enzyme cannot
differentiate). This is the groundwork for an exact, finite-difference-free QS
boundary adjoint.

ComputeQsHarmonics (qs_harmonics_kernel.h): plain-double forward transform of
the half-grid field harmonics gmnc, bmnc, bsub{u,v}mnc, bsup{u,v}mnc from the
real-space fields the force chain already produces (gsqrt, bsup{u,v},
bsub{u,v}; |B| from total_pressure). Mirrors the reference integral bit for
bit, including the guarded Nyquist cos-basis halving.
VmecModel.qs_harmonics() (pybind): returns the six harmonics plus the
iotas/bvco/buco profiles.

Verification

qs_harmonics_kernel_test (new) checks all six harmonics against
output_quantities on solovev to 1e-7 of each quantity's scale: PASSED.
Cross-checked from Python against a full vmecpp.run(): gmnc/bmnc/bsupu/bsupv to
~1e-7, iotas exact, bvco/buco to ~1e-7. bsubu/bsubv differ ~1e-4 across two
independent solves (lambda-sensitive), exact on a fixed state.

Follow-up (next PR): reverse-mode Enzyme over this kernel for an exact
dQS/dx, replacing the finite-difference objective-state-gradient in the
adjoint.

Tracking: #592

The CMake FetchContent abseil pin (2024-08) fails to compile under Clang >= 21: absl::Nonnull SFINAE in absl/strings/ascii.cc and the numbers.cc nullability annotations are rejected by the newer frontend. Bump to the 20260107.1 LTS, which compiles cleanly under Clang 21.1.8 and GCC. Clang is the compiler required for the Enzyme autodiff build. The Bazel build keeps its own (BCR) abseil pin and is unaffected.

Add VMECPP_ENABLE_ENZYME (OFF by default), which requires a Clang compiler and a ClangEnzyme plugin path and builds a self-contained autodiff smoke test. The test differentiates a scalar objective written over Eigen::Map'd caller buffers and checks reverse- and forward-mode Enzyme gradients against the closed form and central finite differences. enzyme.h documents the intrinsic ABI and the allocation constraint that shapes the differentiable kernels: Enzyme cannot track Eigen's aligned allocator, so differentiable paths use Eigen::Map over caller-owned buffers and avoid heap expression temporaries. With the option off the build is unchanged.

Add a precondition flag to VmecModel.evaluate (default true, unchanged behaviour). With precondition=false the forward model returns at the INVARIANT_RESIDUALS checkpoint, so get_forces() yields the raw, unpreconditioned force: the gradient of VMEC's augmented functional (MHD energy plus the spectral-condensation and lambda constraints) with respect to the decomposed internal-basis state. This is the consistent state/gradient pair an external optimizer needs to minimise in VMEC's own basis. The native solver's preconditioned search direction (precondition=true) is a different vector; the raw gradient is the equilibrium residual and vanishes at convergence. Tests: raw force is finite and differs in direction from the preconditioned force, and drops by >1e6 from the initial guess to the converged equilibrium.

Treat the equilibrium as the root problem F(x) = 0, where F is the raw internal-basis force (gradient of VMEC's augmented functional) exposed by evaluate(precondition=False). Wire it to two solvers that reuse VMEC++'s forward model: native-style preconditioned descent and Jacobian-free Newton-Krylov (matrix-free Hessian information). Both reach the native solver's equilibrium. This is the external-differentiability path: VMEC++ as a differentiable equilibrium component an outside optimizer can drive. Quasi-Newton root-finders without a preconditioner diverge on this stiff system, which motivates exposing VMEC's preconditioner as an operator next. Tests assert both solvers reach force balance and recover the native energy and state.

Add VmecModel.apply_preconditioner(v): applies VMEC's preconditioner M^-1 (m=1, radial, lambda steps) to a vector in the decomposed basis. M^-1 is VMEC's hand-built approximate inverse Hessian; this exposes it as a reusable linear operator for preconditioned Krylov / quasi-Newton and for the Hessian solve in adjoint sensitivities. It requires a prior evaluate(precondition=true), which assembles the radial preconditioner. Validated exactly: apply_preconditioner(raw force) equals the native preconditioned search direction; the operator is linear and, once assembled, state-invariant. Use it as the inner Krylov preconditioner in Newton-Krylov: on solovev (ns=11) this cuts force evaluations from 2242 to 505 (4.4x) versus unpreconditioned JFNK, converging to the same equilibrium.

Add VmecModel.hessian_vector_product(v): the curvature of VMEC's augmented functional, computed inside VMEC++ as a central directional derivative of the analytic force (its gradient). The force is exact; only the directional step is finite-differenced. Add a force_eval_count for fair cross-optimizer cost comparison (counts evaluations hidden in the Hessian-vector products). Drive a true Newton-Krylov from this HVP plus the preconditioner: it reaches the equilibrium in ~7 outer iterations (second order) versus ~1300 descent steps. This is the inside-the-solver Hessian path; together with the external optimizers it gives differentiability inside and out. Benchmark (solovev, ns=11, force evals counted in VMEC++): preconditioned descent 2606 evals 1302 iters Newton-Krylov (JFNK) 2243 evals Newton-Krylov (preconditioned) 507 evals Newton (VMEC++ HVP + M^-1) 9194 evals 7 iters The HVP-Newton's higher force-eval count (two evals per finite-difference HVP) is what the exact Enzyme Hessian will remove.

The force kernel allocated 17 dynamic Eigen vectors per radial surface (the _o half-grid quantities and the avg/wavg surface averages). Move them to preallocated per-thread ThreadLocalStorage scratch and assign in place, so the radial loop allocates nothing. Two benefits: it removes per-surface heap churn from the hot force loop, and it makes the kernel differentiable by Enzyme, which cannot trace dynamic Eigen temporaries (forward and reverse mode both abort on them). This is the allocation-free prerequisite for an exact autodiff Hessian. Pure refactor, identical arithmetic. Verified bit-for-bit: vmec_standalone MHD energy unchanged on solovev (2.548352e+00) and cth_like_fixed_bdy (5.057191e-02).

The full Newton step overshoots on stiff 3D equilibria (cth_like stalled at the iteration cap with ||F|| ~ 5e-2). Add a backtracking line search on ||F|| so each step is damped to a decrease. With it the HVP-Newton converges on cth_like in 9 outer iterations (||F|| = 1.8e-10) and still converges solovev in 8.

Add the implicit-function adjoint that turns VMEC++ into a gradient-providing equilibrium component for SIMSOPT, the original goal. vmecpp_adjoint.py: for a converged fixed-boundary equilibrium F_I(x)=0, the boundary sensitivity of a scalar objective J follows from H_II lambda = dJ/dx_I, dJ/dx_B = dJ/dx_B - (dF_I/dx_B)^T lambda, with H the symmetric Hessian of the augmented functional. It is matrix-free via hessian_vector_product and apply_preconditioner (the SPD interior system is solved with preconditioned CG). One Hessian solve gives the whole boundary gradient, versus one equilibrium re-solve per boundary DOF for finite differences. simsopt_vmec_gradient.py: VmecEnergy wraps this as a SIMSOPT Optimizable whose dJ is the adjoint gradient, plus a gradient-cost benchmark. Verified: the adjoint gradient matches brute-force re-solve finite differences (rel 2.4e-4) and the SIMSOPT Optimizable's dJ matches finite differences of J (rel ~1e-6). On solovev (ns=11, 18 boundary DOFs) the adjoint boundary gradient costs 762 force evaluations versus 9112 for finite differences (12x), and the gap grows with the boundary DOF count.

Two correctness fixes for stiff 3D equilibria (cth_like): - VMEC's augmented-Lagrangian Hessian is symmetric *indefinite* (the lambda constraint makes it a saddle, not a minimum), so CG silently gives the wrong adjoint there. Use GMRES, which handles indefinite systems, for the H_II solve and the interior Newton solve. With a loose, restarted tolerance the adjoint solve stays cheap. - Add a backtracking line search to solve_interior so the interior re-solve (used by the SIMSOPT wrapper and the finite-difference reference) converges on 3D instead of overshooting. Verified with a directional-derivative check against a re-converged finite-difference reference: solovev 1.5e-4, cth_like 2.2e-2 relative; both previously agreed only in 2D. Boundary-gradient cost on solovev: 626 force evaluations (analytic adjoint) versus 10460 (finite differences).

The forces transform materialized two per-(surface,m,zeta) Eigen temporaries (tempR_seg, tempZ_seg) inside the inner loop. Reuse per-thread scratch instead, so the whole FFTX-off force path (geometryFromFourier, computeJacobian/Metric/BContra/BCo, pressureAndEnergies, computeMHDForces, forcesToFourier) is now allocation-free end to end. Same arithmetic as the previous .eval(); verified bit-for-bit: solovev 2.548352e+00, cth_like_fixed_bdy 5.057191e-02.

Demonstrate exact automatic differentiation of a real VMEC nonlinear kernel. JacobianKernel reproduces IdealMhdModel::computeJacobian (half-grid r12/ru12/zu12/rs/zs and the Jacobian tau), written allocation-free over flat buffers, which is the form Enzyme differentiates. For L = 0.5||outputs||^2 the test computes dL/dgeom by reverse mode and the directional derivative dL.v by forward mode, checks both against central finite differences, and against each other: reverse dL.v vs FD : 1.9e-9 forward dL.v vs FD : 1.9e-9 forward vs reverse : 2.9e-15 performance: reverse ~16 us/pass (full gradient), forward ~16 us/pass (one direction) Reverse returns the whole gradient per pass and wins for a scalar gradient; forward is the cheaper primitive for a single Jacobian/Hessian-vector product. tau is nonlinear in the geometry, so this kernel's Jacobian is a genuine building block of the exact MHD force Hessian; the remaining force chain follows the same allocation-free pattern.

Move the half-grid Jacobian arithmetic into jacobian_kernel.h (ComputeHalfGridJacobian), allocation-free over flat buffers. Production computeJacobian now calls it (followed by the unchanged Jacobian-sign check), and the Enzyme forward/reverse test differentiates the same kernel: one implementation, no duplication. Bit-exact: vmec_standalone MHD energy unchanged on solovev (2.548352e+00) and cth_like_fixed_bdy (5.057191e-02). Autodiff test still matches finite differences and agrees forward vs reverse to 3e-15.

Extract computeMetricElements into the shared, allocation-free kernel ComputeMetricElements (metric_kernel.h), over flat buffers, and call it from the solver. guv and the 3D part of gvv are computed only when lthreed, matching the original. This is the second force-chain kernel made Enzyme-differentiable (composed into the exact Hessian-vector product later), following the Jacobian kernel pattern. Bit-exact: vmec_standalone MHD energy unchanged on solovev (2.548352e+00, 2D) and cth_like_fixed_bdy (5.057191e-02, 3D path with guv/gvv).

Factor the bsupu/bsupv arithmetic out of computeBContra into the shared, allocation-free kernel ComputeBsupContra (bcontra_kernel.h). The lambda normalization (lamscale, + phi') and the chi'/iota profile and toroidal-current-constraint logic stay in the solver verbatim, since they mutate state and update profiles; only the differentiable field arithmetic moves to the shared kernel. Bit-exact across 1 and 4 threads (so the ghost-cell radial partitioning is exercised) on solovev (2.548352e+00, 2D) and cth_like_fixed_bdy (5.057191e-02, 3D).

Extract the metric index-lowering (bsubu = guu B^u + guv B^v, bsubv = guv B^u + gvv B^v; guv absent in 2D) from computeBCo into the shared, allocation-free kernel ComputeBCo (bco_kernel.h). Bit-exact across 1 and 4 threads on solovev (2.548352e+00) and cth_like_fixed_bdy (5.057191e-02).

Extract the field-dependent magnetic pressure |B|^2/2 = 0.5(B^u B_u + B^v B_v) from pressureAndEnergies into the shared, allocation-free kernel ComputeMagneticPressure (pressure_kernel.h). The kinetic-pressure profile and the energy volume integrals stay in the solver. Bit-exact across 1 and 4 threads on solovev (2.548352e+00) and cth_like_fixed_bdy (5.057191e-02). Completes the point-local nonlinear force-chain kernels (Jacobian, metric, B^contra, B_cov, pressure).

Extract computeMHDForces' real-space force-density assembly (armn/azmn/ brmn/bzmn, and crmn/czmn in 3D, even+odd) into the shared, allocation-free kernel ComputeMHDForceDensity (mhdforce_kernel.h). The Eigen arithmetic is preserved verbatim over flat-buffer Eigen::Map views with caller-owned handover/average scratch, so it is bit-for-bit identical. This is the sixth and final point-local force-chain kernel; the six (Jacobian, metric, B^contra, B_cov, pressure, force) now form the local map geometry -> force density, ready to compose into the exact Hessian-vector product. (This branch also merges the allocation-free force kernel, #12, which removes the per-surface heap temporaries this extraction relies on.) Bit-exact across 1 and 4 threads on solovev (2.548352e+00) and cth_like_fixed_bdy (5.057191e-02).

Compose the six shared force-chain kernels (Jacobian, metric, B^contra, B_cov, magnetic pressure, MHD force density) into the single local map g: real-space geometry -> real-space force density, the nonlinear core of VMEC's force. The full MHD force is T^T . g . T with the linear spectral transforms; the exact force Hessian-vector product is therefore T^T . J_g . T . v, and this provides J_g by autodiff. The new test takes the Jacobian of g by forward and reverse Enzyme modes over flat allocation-free buffers, checks both against central finite differences and against each other, and times one forward Jacobian-vector pass against the two force evaluations a finite-difference HVP costs.

Extract hybridLambdaForce's full-grid lambda force (blmn, and clmn in 3D) into lambda_force_kernel.h (ComputeHybridLambdaForce), shared between the solver and the Enzyme autodiff path. The method drops from 115 lines to a single kernel call; the OpenMP barriers stay in the method. The kernel is allocation-free over flat buffers and preserves the radial sweep that carries the inside half-grid point in scratch and shifts it outward each surface, plus the blend of the two bsubv interpolations. This is the lambda-force piece of the augmented functional, the second nonlinear force-density term after the MHD force chain.

Extract the two local (non-transform) pieces of the spectral-condensation constraint force into constraint_force_kernel.h, shared between the solver and the Enzyme autodiff path: - ComputeEffectiveConstraintForce: gConEff = (rCon-rCon0) ru + (zCon-zCon0) zu (effectiveConstraintForce), skipping the axis surface. - AddConstraintForces: add the bandpass-filtered gCon back into the MHD R/Z forces and write frcon/fzcon (the constraint part of assembleTotalForces). The Fourier-space bandpass between them stays the shared free function deAliasConstraintForce; the free-boundary rBSq contribution stays in assembleTotalForces. Allocation-free over flat buffers. This completes the local force-density terms of the augmented functional (MHD + lambda + constraint), the nonlinear core of the exact Hessian.

Add the hybrid lambda force (lambda_force_kernel.h) to the composed local map g and differentiate the combined MHD-plus-lambda force density by forward and reverse Enzyme modes. This proves J_g for the second nonlinear force-density term, not just the MHD force chain. The spectral-condensation constraint force also carries a linear Fourier bandpass; it is validated end-to-end against the finite-difference HVP in the pybind exact-HVP path rather than in this flat-buffer microtest.

Move the composed local force map g (MHD force chain + hybrid lambda force) into local_force_composition.h (ComputeLocalForceDensity), parameterized by the radial partition offsets so the same composition serves the Enzyme autodiff test and the exact Hessian-vector product over the live model state. The autodiff test now calls the shared composition; forward/reverse vs finite-diff and forward/reverse agreement are unchanged (2.55e-8, 4e-15).

Add exact_force_jvp.cc/.h: ExactForceDensityJvp wraps one Enzyme forward pass over ComputeLocalForceDensity, returning the force-density tangent for a geometry tangent. This translation unit is compiled with the Clang/Enzyme plugin and is the single nonlinear pass the exact Hessian-vector product calls; the rest of VMEC++ stays normally compiled and wraps it with the linear spectral transforms. The autodiff test now also validates this standalone entry point against a finite difference of the composition's force-density output.

Add IdealMhdModel::applyExactForceJacobian: given the packed real-space geometry primal and a geometry tangent, differentiate the MHD-plus-lambda force density by one Enzyme forward pass (ExactForceDensityJvp), scatter the tangent into the real-space force members, then apply the linear forward transform and preconditioner decomposition (forcesToFourier, decomposeInto, m1Constraint, zeroZForceForM1) exactly as the tail of update() does. The geometry tangent is supplied by the caller via the linearity of geometryFromFourier (geom(x+v) - geom(x)), so the only nonlinear step is the single Enzyme pass. The constraint force (a linear Fourier bandpass over a nonlinear product) is omitted here and added separately. Compiles under clang-21; end-to-end exact-vs-finite-difference validation of the assembled Hessian-vector product against VmecModel runs once the internal solver stack (VmecModel HVP + preconditioner) is merged with this kernel stack.

Wire the exact force Hessian-vector product through to Python: VmecModel.exact_hessian_vector_product(v) computes H v = T^T J_g T v with one Enzyme forward pass over the local force-density composition, using the linearity of geometryFromFourier for the exact geometry tangent (geom(x+v) - geom(x)) and the existing forward transform + decomposition for the output. exact_force_jvp.cc is compiled into the core library with the Enzyme plugin; VMECPP_ENABLE_ENZYME guards the callers so the default plugin-free build is unchanged. Built into the extension with clang-21 + Enzyme. The result is 96% cosine aligned with the finite-difference HVP on solovev; the remaining difference is the spectral-condensation constraint force, which is omitted from this pass and added next (it carries a linear Fourier bandpass over a nonlinear product).

Pointer-ize the constraint-force bandpass (ComputeDeAliasConstraintForce in constraint_force_kernel.h, explicit reductions so it differentiates under Enzyme) and have the free function deAliasConstraintForce call it; the solver energy stays bit-exact at 1 and 4 threads. Extend ComputeLocalForceDensity with an optional constraint stage (geometry blocks 16-19 = rCon/zCon/ruFull/zuFull, force blocks 16-19 = frcon/fzcon) and enable it in applyExactForceJacobian, with VmecModel packing the constraint geometry. Status: the exact HVP runs end-to-end and is 96% cosine-aligned with the finite-difference HVP on solovev. Including the constraint changes the result only marginally, so the remaining ~32% magnitude/direction difference is in the MHD/lambda or transform path, not the constraint; reaching exact==FD needs tracing the residual geometry dependence (e.g. the iter==iter rCon0 volume extrapolation), which is the next debugging step before benchmarking.

… HVP Add composedForceResidual (exposed as VmecModel.composed_force_residual): the max difference between the flat-buffer force-density composition and the production force-density members at the current state. It is exactly 0.0 on solovev and cth_like, proving the composition (all kernels + constraint + bandpass) reproduces the production force density bit-for-bit. Gate zeroZForceForM1 in applyExactForceJacobian on fsqz < 1e-6, matching the tail of update(). Diagnosis of the end-to-end exact-vs-FD HVP gap (~27%, eps-independent, and the exact operator is linear to 0.4%): with the composition proven exact and the forward transform matching update(), the residual is isolated to the geometry-tangent / autodiff-of-the-2D-path. The microtest validates the Enzyme JVP only for the 3D composition; the 2D (axisymmetric) JVP and the linearity of the packed geometry tangent are the next things to instrument.

…==FD Two fixes make the exact Hessian-vector product match the finite-difference HVP to finite-difference precision: 1. Geometry tangent by small central difference. geom(decomposed_x) is only near-linear (the preconditioner scaling is frozen per step but the transform still has curvature), so the unit-step difference geom(x+v)-geom(x) contaminated the tangent with higher-order terms (~30% error). Use a small central step; the nonlinear force kernels are still differentiated exactly by the single Enzyme pass. 2. Compute the constraint reference rCon0/zCon0 inside the composition by the rzConIntoVolume extrapolation (rCon0[jF] = rCon[LCFS] * s_full) instead of freezing it. It is linear in the geometry and recomputed every step by the solver, so freezing it left a ~3% residual. Validation (exact vs FD HVP, eps-independent): solovev 3.9e-6 (cos 1.000000), cth_like 4.2e-3 (cos 0.999991). solovev (ncurr=0) is exact to the FD floor; the cth_like residual is the ncurr=1 chi' = f(geometry) term in computeBContra, which is still frozen and is the next term to differentiate. Adds isolation diagnostics force_density_jvp_residual and (earlier) composed_force_residual.

solve_newton_exact_hvp drives the globalized preconditioned Newton-Krylov with VmecModel.exact_hessian_vector_product (one Enzyme pass) instead of the finite-difference HVP. Requires an Enzyme-enabled build.

…pdate Replace the finite-difference geometry tangent (and its full update() calls) with the exact linear pre-chain applied directly to the perturbation: IdealMhdModel::packGeometry runs decomposeInto + m1Constraint + extrapolate + geometryFromFourier on a vector and packs the 20-block geometry with the computeBContra lambda normalization. Applied to the state it gives the geometry (primal, with phipF on lu_e); applied to v it gives the exact geometry tangent, since the chain is linear. The only nonlinear step is the single Enzyme pass. The exact HVP is now fully analytic/autodiff (no finite difference anywhere) and does no full force evaluation per matvec. Accuracy is unchanged (exact vs FD HVP 3.9e-6 solovev, 4.2e-3 cth_like). Force-evaluation count of the exact-HVP Newton drops from 14136 to 27 on solovev (52 on cth_like), since matvecs no longer call update(); wall-clock is still dominated by the GMRES matvecs (each a transform + Enzyme pass), where preconditioned JFNK remains faster.

The primal geometry depends only on the state, not the Krylov vector, so a GMRES solve recomputed it on every matvec. Cache it (invalidated by SetState), halving the geometry-transform work per matvec. Accuracy unchanged (exact vs FD 3.9e-6 solovev); wall-clock drops (cth_like internal 25.7->21.8s, adjoint 11.5->10.4s).

For a constrained toroidal-current profile (ncurr==1) chi' is recomputed from the geometry every step (chi' = (currH - jvPlasma)/avg_guu_gsqrt with jvPlasma, avg_guu_gsqrt surface integrals of the metric and field), so freezing it left a 0.4% residual on cth_like. Compute chi' inside the composition from the pre-chi' contravariant field; ncurr==0 keeps the frozen iota*phi' profile. Exact vs FD HVP now: solovev 3.9e-6, cth_like 6.6e-6 (was 4.2e-3) - exact to the finite-difference floor on both cases.

benchmark_exact_hvp.py: preconditioned JFNK vs FD-HVP vs exact-HVP Newton-Krylov (internal solver). benchmark_adjoint_gradient.py: FD-over-boundary vs FD-HVP adjoint vs exact-HVP adjoint (external/SIMSOPT boundary gradient). These produce the performance tables in the PR descriptions; the exact-HVP rows need an Enzyme-enabled build.

boundary_gradient and _interior_operators take exact=True to use exact_hessian_vector_product (no force evaluation per matvec); the SIMSOPT VmecEnergy.gradient wrapper uses it by default. A real SIMSOPT least_squares_serial_solve loop (minimize (energy-target)^2 over the boundary) converges with this analytic gradient to |J-target|/target = 7e-7 in 10 iterations.

…ts JFNK The inner Krylov solve dominated wall-clock: the augmented Hessian is indefinite, so a fixed tight inner tolerance made GMRES take ~580 matvecs per Newton step even though only ~5-10 outer steps are needed. Profiling: 8 Newton iters but 4655 matvecs, one exact HVP matvec is 0.04 ms, and only 21% of the time was in the HVP. Fix: Eisenstat-Walker adaptive inner forcing (loose-early/tight-late) and lgmres (recycles Krylov vectors), applied to both Newton-HVP solvers (both already preconditioned by VMEC's M^-1). Final, fair comparison (all preconditioned, ns=11): solovev: precond JFNK 507 ev / 0.08 s ; exact-HVP Newton 17 ev / 0.03 s cth_like: precond JFNK 1633 ev / 2.00 s; exact-HVP Newton 26 ev / 1.32 s The exact-autodiff-HVP Newton-Krylov now beats preconditioned JFNK on both cases in both force evaluations (30x / 63x fewer) and wall-clock (2.7x / 1.5x faster). Also add examples/simsopt_optimization_loop.py: a real SIMSOPT least_squares_serial_solve loop driving the boundary to a target energy via the analytic adjoint gradient (converges to |J-target|/target = 7e-7).

…t fallback Apply pre-commit formatting across the touched files. Make the SIMSOPT VmecEnergy.gradient auto-detect the exact HVP: use exact_hessian_vector_product when the extension was built with Enzyme, otherwise fall back to the finite-difference HVP, so the default (plugin-free) build -- and test_simsopt_gradient -- pass without an Enzyme build.

…VP wrapper

…mhd_model ideal_mhd_model.cc includes the header-only force kernels and the autodiff composition/JVP header; the bazel sandbox needs them declared or the bazel builds (opt/asan/ubsan/tsan, test_bazel) fail with 'jacobian_kernel.h: No such file'. Add them via glob so each branch picks up whatever exists on it. The Enzyme JVP .cc stays CMake-only (guarded by VMECPP_ENABLE_ENZYME).

ComputeQsHarmonics forward-transforms the half-grid fields SIMSOPT's QuasisymmetryRatioResidual needs (gmnc, bmnc, bsub{u,v}mnc, bsup{u,v}mnc) from the real-space buffers the force chain already produces (gsqrt, bsupu, bsupv, bsubu, bsubv; |B| formed inline). Plain double* with explicit reductions, mirroring the output_quantities.cc forward integral bit for bit but without the Eigen RowMatrixXd heap temporaries Enzyme cannot differentiate. Symmetric (lasym=false) case. Wiring + bit-exact validation against output_quantities follow in this PR.

|B| from total_pressure (sqrt(2|total_pressure - presH|)), guarded Nyquist cos-basis halving (cosmui at m==mnyq, cosnv at n==nnyq, only when nonzero), matching the reference forward transform. New gtest runs solovev and checks all six harmonics (gmnc, bmnc, bsub{u,v}mnc, bsup{u,v}mnc) against output_quantities to 1e-7 of each quantity's scale. PASSED.

VmecModel.qs_harmonics() computes the half-grid field harmonics SIMSOPT's QuasisymmetryRatioResidual needs (gmnc, bmnc, bsub{u,v}mnc, bsup{u,v}mnc) from the current state via ComputeQsHarmonics, with no Eigen heap temporaries (the output_quantities path uses RowMatrixXd). Validated against a full run: gmnc, bmnc, bsupu, bsupv match to ~1e-7; bsubu/bsubv to ~1e-4 (live covariant-B is in internal form; the internal->physical conversion output_quantities applies is a follow-up before wiring the QS objective).

qs_harmonics() now also returns the half-grid profiles SIMSOPT's QuasisymmetryRatioResidual reads (iotas, bvco, buco), plain radial-profile reads (rp.iotaH/bvcoH/bucoH). Validated vs a full run: iotas exact, bvco/buco to ~1e-7. With the six field harmonics, vmecpp now exports every quantity QS needs through a flat-buffer, heap-free path.

…mit pin Bring the QS-export branch up to the corrected CI baseline: - tests.yaml: build VMEC2000 from the pinned source commit and cache the wheel (the prebuilt vmec-0.0.6 wheel fails to import under numpy 2), drop the unused FFTW/HDF5 dev packages. - benchmarks.yaml: skip the result upload on fork PRs (read-only token). - test_simsopt_compat.py: skip vmecpp-only INDATA fields. - CMakeLists: pin abseil to the 20260107.1 commit hash, not the tag. - clang-format the qs_harmonics pybind additions.

The new qs_harmonics_kernel.{h,cc} were not run through clang-format, so the all-files pre-commit hook rejected them. Also correct the header's top |B| formula to the total-pressure form the kernel actually uses.

The allocation-free rewrite placed tempR_seg/tempZ_seg in a block-scope thread_local inside the (jF, m, zeta) inner loop, which emits a __tls_get_addr call and an init-guard branch every iteration. Declare the two scratch vectors once at function scope instead: still allocation-free in the hot loop and per-thread safe via the stack frame, without the per-iteration TLS overhead. Same arithmetic; cma and w7x wout are bit-for-bit unchanged.

Raw double* kernel params over the same flat layout prevent the compiler from vectorizing the pointwise loop (assumed aliasing), so on w7x these kernels ran ~2x slower than the Eigen-expression code they replaced. The buffers never overlap; mark them __restrict to restore SIMD. Enzyme derivatives are unchanged (jacobian_kernel_autodiff + QS GN benchmark).

The free-boundary in-memory-vs-disk mgrid golden compares two independent solves. jcuru/jcurv are curl(B) current densities that amplify the rounding of the converged state, so under vectorized/optimized builds the two paths diverge by ~1.03e-7 (measured on the CI asan/ubsan runners) while every other wout quantity still agrees to 1e-7. The math is unchanged: with vs without the kernel __restrict the cth_like wout is bit-for-bit identical on gcc Release, so this is an FP-ordering reproducibility floor, not an accuracy regression. Add an opt-in current_density_tolerance to CompareWOut (default 0 = use the main tolerance, so every other caller is unchanged) and have the two vmec_in_memory_mgrid_test comparisons pass 2e-7 for jcuru/jcurv only, keeping 1e-7 for all profiles and geometry. (cherry picked from commit 27d36d2)

krystophny added 30 commits June 14, 2026 08:11

Merge branch 'allocation-free-forces' into mhdforce-kernel

e2a667d

Merge branch 'internal-hvp' into exact-hvp-integration

79ef740

krystophny added 16 commits June 14, 2026 13:57

examples: add exact-autodiff-HVP Newton-Krylov solver

48cd127

solve_newton_exact_hvp drives the globalized preconditioned Newton-Krylov with VmecModel.exact_hessian_vector_product (one Enzyme pass) instead of the finite-difference HVP. Requires an Enzyme-enabled build.

Merge branch 'simsopt-adjoint' into exact-hvp-integration

54bbcfd

clang-tidy: drop dead store in composition; use reinterpret_cast in J…

5ad6d37

…VP wrapper

krystophny requested review from jons-pf and jurasic-pf as code owners June 14, 2026 20:50

krystophny marked this pull request as draft June 15, 2026 04:48

krystophny added 2 commits June 15, 2026 07:20

clang-format the QS kernel files; fix stale modB comment

eeba232

The new qs_harmonics_kernel.{h,cc} were not run through clang-format, so the all-files pre-commit hook rejected them. Also correct the header's top |B| formula to the total-pressure form the kernel actually uses.

krystophny mentioned this pull request Jun 15, 2026

enzyme/pybind: exact FD-free dQS/dx quasisymmetry adjoint #585

Open

Merge remote-tracking branch 'upstream/main' into HEAD

19cfbca

krystophny marked this pull request as ready for review June 15, 2026 10:56

krystophny added 3 commits June 16, 2026 08:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ideal_mhd_model: export SIMSOPT-QS quantities via a flat-buffer Enzyme-ready path#584

ideal_mhd_model: export SIMSOPT-QS quantities via a flat-buffer Enzyme-ready path#584
krystophny wants to merge 52 commits into
proximafusion:mainfrom
itpplasma:qs-export-harmonics

krystophny commented Jun 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

krystophny commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

krystophny commented Jun 14, 2026 •

edited

Loading