underworldcode · lmoresi · Jun 10, 2026 · Jun 9, 2026 · Jun 9, 2026 · Jun 10, 2026
@@ -238,6 +238,229 @@ now. A true *co-dimension-1* `λ` (boundary-only DOFs end-to-end) remains the
 honest long-term form; the full-domain field + boundary-only reduction is the
 pragmatic path that ships today.
 
+## Conditioning, the augmentation's true role, and rigid-body null spaces
+
+This section records what a focused study of strong **boundary viscosity
+contrast** (annulus lateral `μ_hi ≥ 10³`, SolCx's `1e6` jump) revealed about
+*why* the constraint block is sometimes hard to solve — and corrects an
+over-simplification in the "augmentation is purely a speed knob" framing above.
+
+### The constraint Schur preconditioner (`selfp`, the shipped default)
+
+The grouped `[p, λ]` block is preconditioned through its Schur complement. With
+the base-Stokes default `pc_fieldsplit_schur_precondition = a11`, that
+preconditioner is the assembled `a11` block — a viscosity-scaled **mass** on
+pressure (`1/μ`) and the small screening on `λ`. The pressure part is the right
+scaling (`S_p ≈ μ⁻¹ M_p`); the **`λ` part is not** — the true constraint Schur is
+
+$$S_\lambda = C\,A^{-1}C^{\mathsf T},$$
+
+a boundary operator that scales like `1/μ` but is *not* a simple `λ`-mass. Under
+strong contrast the bare `a11` mass is a poor approximation of `S_λ` and the
+solve walls (or needs a very large augmentation to compensate).
+
+`SNES_Stokes_Constrained` therefore defaults to
+**`pc_fieldsplit_schur_precondition = selfp`** instead. `selfp` forms
+
+$$S \approx A_{11} - A_{10}\,\operatorname{diag}(A_{00})^{-1}A_{01}$$
+
+from the *actual* operator blocks; its `λλ` corner is
+`−C diag(A)⁻¹ Cᵀ`, i.e. the true constraint Schur, **automatically and at no
+extra assembly**. On a smooth lateral viscosity ramp the outer Krylov count is
+flat (≈2–6 iterations) across `μ_hi = 1 … 10⁶`, where the bare `a11` mass climbs
+to ~25 and diverges. Override with
+`solver.petsc_options["pc_fieldsplit_schur_precondition"] = "a11"` if needed.
+
+**Why `selfp` and the `r ∝ μ` augmentation are mutually consistent.** The bare
+constraint Schur is `S₀ = C A⁻¹ Cᵀ ~ 1/μ`. The augmentation adds `r·N` to `A`
+(with `N ~ CᵀC`, the boundary `n⊗n`), so by Woodbury the *augmented* Schur is
+`S_r = C(A + rN)⁻¹Cᵀ ≈ S₀(I + r S₀)⁻¹` — equal to `S₀ ~ 1/μ` for small `r` and
+tending to `1/r` for large `r`, with the **crossover at `r·S₀ ~ 1`, i.e. `r ~ μ`**.
+So the default `r = augmentation_base · μ(x)` is the AL-natural (crossover)
+scaling. And `selfp` builds its Schur from `diag(A + rN)`, whose boundary diagonal
+is `μ + r = μ(1 + augmentation_base)` — so `selfp`'s `λλ` block `~ 1/r`
+**automatically tracks the augmented Schur, because `diag(A)` already contains the
+penalty.** This is *why* they compose so well: the approximate inverse "sees" the
+augmentation. A **constant** `r` (instead of `∝ μ`) was tested and is strictly
+worse — it is negligible on the stiff side (`r ≪ μ`, no regularisation) *and*
+breaks the uniform `diag(A)` scaling that `selfp` relies on, giving garbage
+velocity at SolCx 1e6 (`velerr 1–12` across `r = 10²–10⁴`, vs `1.3e-4` for
+`r ∝ μ`). The `μ`-weighting is load-bearing; keep it.
+
+### The augmentation has two roles — conditioning *and* null-space regularisation
+
+The earlier sections describe `r` as a *speed knob* that conditions the `[p, λ]`
+Schur complement. That is correct, but incomplete. `r(n⊗n)` does **two**
+independent things:
+
+1. **Conditions `S_λ` from the velocity (A) side.** Stiffening the boundary
+   velocity is one way to make the constraint Schur well-behaved. `selfp`
+   conditions the *same* operator from the Schur side. They are **substitutes**:
+   on a viscosity-contrast sweep, increasing `r` *or* switching to `selfp` both
+   collapse the iteration count. With `selfp` the augmentation can drop to zero on
+   a **single-constraint** problem and the solve still converges.
+
+2. **Regularises the velocity block's rigid-body null space** — a *structural*
+   role that `selfp` alone cannot fill (see below). This is why an all-free-slip
+   enclosed box still needs `r > 0` even with `selfp`.
+
+So `r` is not "just an accelerator". On problems with a velocity anchor (a no-slip
+patch) it is optional with `selfp`; on problems with **no** velocity anchor it is
+doing genuine null-space regularisation.
+
+### Rigid-body null spaces: a general principle (not specific to constraints)
+
+The velocity operator `A = ∫ 2μ ε(u):ε(v)` has the **zero-strain rigid-body
+motions in its kernel for *any* mesh** — **2 translations + 1 rotation in 2D, 3 +
+3 in 3D** — removed *only* by Dirichlet velocity DOFs. In a monolithic direct
+solve this is hidden, but a **Schur-factored** solve inverts `A` as an *inner*
+block that sees only `A` (not the constraint rows or the pressure), so the
+singularity surfaces there.
+
+```{important}
+Whenever **nothing pins velocity anywhere** — all free-slip, all-natural, the
+free-slip spherical shell — the inner `A`-solve is singular along the rigid-body
+modes. The constraints kill those modes in the *full* system but never in the
+*inner* `A`-solve, so it can drift before the outer coupling acts. In the
+**block-constrained** solver the working remedy is the **augmentation**
+`r(n⊗n)`: it stiffens those modes out of `A` directly, so the inner solve is
+well-posed. This is a *third*, structural role of the augmentation (beyond
+conditioning the Schur complement), and it is why an all-free-slip enclosed
+problem needs `r > 0` even with `selfp`.
+```
+
+```{warning}
+**Do not** try to remove the inner-`A` singularity by attaching the rigid-body
+modes to the *coupled* null space (`petsc_velocity_nullspace_basis`) on a
+**constrained** problem — neither the rotation nor the translations work cleanly:
+
+- The **rotation** carries a *real* part of the answer: a closed `u·n=0`
+  incompressible flow generically has nonzero circulation `∫(x u_y − y u_x) dV`,
+  so it is not orthogonal to the rotation mode. Projecting it out corrupts the
+  velocity (measured: SolCx velerr `3e-1`; with all three modes, `≈1.0`).
+- The **translations** *are* orthogonal to the solution in **L2**
+  (`∫u_i dV = 0` for incompressible `u·n=0` on a closed boundary), so in principle
+  they are harmless to suppress. But PETSc projects against the **Euclidean nodal**
+  inner product, and the unweighted nodal sum `Σ u_i` is not zero even when the
+  integral is — so projecting the constant mode still injects error (SolCx velerr
+  `1e-4 → 1.4e-2`). Honouring the L2 orthogonality needs **mass-weighted** null
+  vectors.
+
+Attaching the modes to the **velocity sub-block** instead (the inner solve) was
+prototyped and is *less wrong* than the coupled route — at `aug=10⁴` it degrades
+SolCx velocity to `4e-3` rather than destroying it (`≈1.0`) — but it still does
+not match the augmentation (`1.3e-4`) and does not enable small/zero `aug`. The
+reason is that `selfp` builds its Schur preconditioner from `diag(A)`, which is
+**not** rank-deficient even when `A` is, so the preconditioner is blind to the
+sub-block null space. A clean version would need the null space attached natively
+(`DMSetNullSpaceConstructor`, before setup) *and* a Schur PC that respects it —
+an unproven, non-trivial enhancement. **In practice the augmentation is the
+working remedy for unanchored free-slip** (it stiffens the modes out of `A`
+directly, with no pseudo-inverse or inner-product subtleties). Note
+`petsc_velocity_nullspace_basis` remains correct for a genuine *coupled* null
+mode, e.g. the rigid **rotation** of a free-slip spherical shell solved *without*
+a constraint multiplier.
+```
+
+The buoyancy-driven annulus escapes the singularity entirely because its
+**no-slip inner boundary** already pins all rigid-body modes in `A`; there `aug`
+is fully optional (aug=0 and aug=10⁴ velocities agree to `3e-4`) and small `aug`
+gives clean velocity, pressure, *and* topography at once.
+
+One further consequence:
+
+- The bare velocity rigid-body modes are the **GAMG near-null space** of the
+  velocity block. Set as a *near*-null space on that sub-block
+  (`MatSetNearNullSpace`, distinct from the coupled-operator null-space projection
+  warned against above) they improve the velocity multigrid coarse spaces —
+  standard elasticity/Stokes-multigrid practice — independent of the singularity
+  question. This is the same sub-block plumbing the clean inner-`A` fix would use.
+
+### Solver type at extreme contrast
+
+A fixed-viscosity Stokes solve is **linear**. The default `newtonls` performs an
+inexact-Newton defect correction; when the Schur approximation is stiff (extreme
+contrast) it can take many "Newton" steps or stall, even though each linear solve
+is cheap. Using `snes_type = "ksponly"` solves the linear system directly and is
+markedly more robust (and faster) for constrained free-slip at strong contrast.
+
+### Topography: the mixed / penalty / augmented triad
+
+The boundary constraint mirrors the **incompressibility** triad, with `λ` playing
+the role of `p` (`λ : u·n=g  ::  p : ∇·u=0`):
+
+| formulation | analogue | topography signal |
+|---|---|---|
+| **mixed** (`r = 0`, multiplier) | Taylor–Hood `p` (a real DOF) | `λ` *is* `−σ_nn`, directly |
+| **penalty** (no `λ`, large `r`) | `p = −λ_pen ∇·u` | `−r(u·n)` (recovered from the residual) |
+| **augmented** (`λ + r`) | Uzawa / ALG2 | `−σ_nn = λ + r(u·n)` — a **mix** |
+
+The δu equation gives the boundary traction `−(λ + r(u·n − g))`, so the recovered
+topography splits between the multiplier `λ` and the penalty term `r(u·n)`. The
+constraint residual `u·n ≈` (FE/KSP tolerance) is roughly `r`-independent, so the
+penalty share `r(u·n)` grows **linearly with `r`** (measured: ≈`10⁻⁴`, `10⁻²`,
+`1.0` of the signal at `r/μ = 1, 10², 10⁴`). At small `r`, `λ` *is* the clean
+topography (`corr(λ, −σ_nn) = 0.99995`); at large `r` it inflates and the
+correlation degrades. Crucially the penalty term is pointwise **noise** (`r ×`
+the noisy residual), so adding it back does **not** recover a cleaner signal —
+the fix is to use a **smaller `r`**.
+
+This used to be a trade-off (large `r` for conditioning vs small `r` for clean
+topography). **`selfp` breaks it**: it conditions the constraint Schur from the
+operator side, so a small (or, with a velocity anchor, zero) augmentation gives
+both a well-conditioned solve *and* a clean multiplier. For topography work, keep
+a *modest* `r` for constraint *tightness* (`u·n → 0`), not for conditioning.
+
+### Is there a workable-AL / clean-topography overlap? (the unanchored case)
+
+When the velocity is **unanchored** (all free-slip, no Dirichlet) the augmentation
+is *required* for rigid-body regularisation, so it has a **floor**; topography
+contamination gives it a **ceiling**. The two move oppositely with the boundary
+viscosity contrast `μ`:
+
+- the **regularisation floor** *rises* with `μ` (a stiffer inner-`A` needs more
+  `r(n⊗n)` to lift the rigid modes) — `aug ≲ 1` at `μ=10²`, `aug ~ 10³` at `μ=10⁶`;
+- the **topography ceiling** *falls* like `1/μ` (the penalty noise is
+  `r·(u·n) = aug·μ·ε`).
+
+Measured on the SolCx 4-wall (unanchored), velocity-vs-analytic and
+`corr(λ, −σ_nn)` together:
+
+| regime | overlap |
+|---|---|
+| **anchored** (no-slip core/base, any `μ`) | floor = 0; `aug→0` gives clean `u`, `p`, *and* topography — overlap is everything |
+| **unanchored, moderate `μ` (≲10⁴–10⁵)** | **wide** — the whole `aug ∈ [1, 10⁴]` range is accurate *and* `corr(λ,CBF)=1.0000` (`μ=10²`) |
+| **unanchored, extreme `μ` (~10⁶)** | **none** — regularisation needs `aug≥10³`, where topography has collapsed (`corr 0.7–0.9`); the seemingly-clean low-`aug` corr is spurious (garbage `u` ↔ garbage CBF) |
+
+So the conflict appears only in the corner of *fully-unanchored* geometry **and**
+`~10⁶` contrast. There, lean on the block solver's exact constraint enforcement
+(`RMS(u·n) ~ 10⁻⁹`) and recover topography from the consistent-boundary-flux
+stress `−n·σ·n` (no `r`-amplification), not from `λ` directly.
+
+### The same story for the interior incompressibility penalty
+
+The boundary constraint `u·n=g` (multiplier `λ`) is the surface analogue of the
+interior constraint `∇·u=0` (multiplier `p`). UW3's Stokes carries an **optional
+augmented-Lagrangian grad-div penalty** for incompressibility, `λ ∫ μ (∇·u)(∇·v)`,
+on by setting `solver.penalty` (default `0`). Everything above transfers:
+
+- **Scaling.** The penalty is multiplied by the local viscosity `μ`
+  (`constitutive_model.K`) — exactly as the boundary AL is `∝ μ` — so the ratio
+  penalty/`μ` stays uniform. The `penalty` parameter is therefore a
+  *dimensionless* `O(1)` base. A bare constant (the previous behaviour) over-
+  stiffens low-`μ` regions into velocity locking under contrast (measured: a
+  constant `100` gave SolCx velocity error `0.1`, while the `μ`-scaled `O(1)`
+  penalty gives `~10⁻⁵`).
+- **Pressure correction.** Because the penalty sits in the *operator*, the
+  recovered `p` is the multiplier, not the mechanical pressure:
+  `p_mech = p − penalty·μ·(∇·u)`. At convergence they agree; pointwise they differ
+  by ≈ a couple of percent of `|p|` at `penalty = O(1)`. For a pressure-dependent
+  constitutive law use `p_mech`; for visualisation, raw `p` is adequate.
+- **Usually unnecessary.** The `1/μ` Schur preconditioner already conditions the
+  pressure block (outer KSP is unchanged with or without the penalty), so the
+  default `penalty = 0` — where `p` is the clean physical pressure and needs no
+  correction — is the right starting point.
+
 ## Files
 
 - `src/underworld3/systems/solvers.py` — `SNES_Stokes_Constrained`, `_BlockConstraintBC`.

@@ -4156,6 +4156,24 @@ class SNES_Stokes_SaddlePt(SolverBaseClass):
         self._multipliers = []
         self._multiplier_screening = []
         self._block_constraint_bcs = []
+        # Give the Lagrange-multiplier (lambda) block its own viscosity-scaled
+        # Schur preconditioner. The constraint Schur complement S_lambda = C A^-1 C^T
+        # scales as 1/mu (since A ~ mu K), exactly like the pressure Schur S_p ~ mu^-1 M_p
+        # and independent of the augmentation r. When True, the lambda block's
+        # PRECONDITIONER mass (Pmat only) uses 1/mu (= saddle_preconditioner) while the
+        # true operator (Amat) (lambda,lambda) block stays = eps (exact Newton). This is
+        # the boundary-trace analog of _pp_G0 = 1/mu on pressure, and decouples
+        # convergence from r so r can shrink -> cleaner recovered lambda (= topography).
+        # When on, the lambda block reuses pressure's already-compiled _pp_G0 = 1/mu
+        # for its preconditioner block (same scaling, no extra JIT term). On uniform mu
+        # (fieldsplit) it is bit-identical; on moderate contrast it cracks the wall and
+        # makes the augmentation optional. Default OFF (opt-in): a monolithic lu solve
+        # factorizes the Pmat (pc_use_amat is a no-op here), so this Pmat term is NOT
+        # inert for direct solves — flipping the default on regresses the direct-lu
+        # constraint enforcement. See the design note
+        # docs/developer/design/CONSTRAINED_FREESLIP_MULTIPLIER.md (the warning on the
+        # coupled vs sub-block preconditioner placement).
+        self._multiplier_schur_pc = False
         # Pin interior (off-boundary) multiplier DOFs to 0 so the solved [p,h]
         # block carries only the boundary trace (~√ndof instead of ~ndof/3 DOFs);
         # the boundary trace is the only physical part. Done by constraining the
@@ -4802,8 +4820,48 @@ class SNES_Stokes_SaddlePt(SolverBaseClass):
     def petsc_use_pressure_nullspace(self, value):
         self._petsc_use_pressure_nullspace = bool(value)
         self._reset_stokes_nullspace()
+        # Force re-setup so the changed nullspace is re-registered with PETSc if
+        # this is toggled after the solver has already been set up (matches
+        # `petsc_use_nullspace`; `_attach_stokes_nullspace` runs inside the
+        # is_setup-gated `_setup_solver`).
         self.is_setup = False
 
+    @property
+    def multiplier_schur_pc(self):
+        """
+        Give each Lagrange-multiplier (constraint) block its own viscosity-scaled
+        Schur preconditioner.
+
+        The constraint Schur complement ``S_lambda = C A^-1 C^T`` scales as
+        ``1/mu`` (since ``A ~ mu K``), exactly like the pressure Schur
+        ``S_p ~ mu^-1 M_p`` and **independent of the augmentation** ``r``. When
+        enabled, the multiplier block's *preconditioner* (Pmat) diagonal reuses
+        pressure's ``1/mu`` mass while the true operator (Amat) block stays the
+        screening ``eps`` — so Newton is unchanged and this is a pure
+        preconditioner term (the boundary-trace analog of the pressure
+        ``saddle_preconditioner``).
+
+        Effect: convergence decouples from ``r``. On moderate boundary viscosity
+        contrast (e.g. annulus mu_hi ~ 1e3) the augmentation becomes optional
+        (``augmentation=0`` converges) and the recovered multiplier (= dynamic
+        topography) is cleaner at small ``r``. On extreme contrast (eta jump 1e6)
+        a small augmentation floor is still needed, but the requirement shrinks by
+        orders of magnitude. Pair with ``snes_type='ksponly'`` for linear Stokes —
+        the block solver's default ``newtonls`` defect-corrects a linear system in
+        many steps when the Schur approximation is stiff.
+
+        Default ``False`` (opt-in). On the fieldsplit/iterative path it is
+        bit-identical on uniform ``mu`` and cracks the moderate-contrast wall;
+        but a monolithic ``lu`` solve factorizes the Pmat (``pc_use_amat`` is a
+        no-op there), so this term is not inert for direct solves — hence opt-in.
+        """
+        return self._multiplier_schur_pc
+
+    @multiplier_schur_pc.setter
+    def multiplier_schur_pc(self, value):
+        self._multiplier_schur_pc = bool(value)
+        self.is_setup = False   # force DS re-registration on next solve
+
     @property
     def petsc_velocity_nullspace_basis(self):
         """
@@ -5938,9 +5996,16 @@ class SNES_Stokes_SaddlePt(SolverBaseClass):
         # diagonal mass Jacobian/preconditioner on each multiplier's field.
         for k, mvar in enumerate(self._multipliers):
             fid = mvar._solver_field_id
+            # Operator (Amat) (lambda,lambda) block is ALWAYS the true screening eps so
+            # Newton stays exact; only the preconditioner (Pmat) block swaps to the 1/mu
+            # Schur mass when _multiplier_schur_pc is on (pure-Pmat, can't corrupt Newton —
+            # the exact analog of pressure's _pp_G0 living only in JacobianPreconditioner).
+            # Reuse pressure's already-compiled _pp_G0 (= 1/mu, same scaling) so there is
+            # NO extra JIT term (the 1/mu Piecewise of SolCx is otherwise compiled twice).
+            hh_pc = self._pp_G0 if self._multiplier_schur_pc else self._hh_G0[k]
             PetscDSSetResidual(ds.ds, fid, ext.fns_residual[i_res[self._h_F0[k]]], NULL)
             PetscDSSetJacobian(              ds.ds, fid, fid, ext.fns_jacobian[i_jac[self._hh_G0[k]]], NULL, NULL, NULL)
-            PetscDSSetJacobianPreconditioner(ds.ds, fid, fid, ext.fns_jacobian[i_jac[self._hh_G0[k]]], NULL, NULL, NULL)
+            PetscDSSetJacobianPreconditioner(ds.ds, fid, fid, ext.fns_jacobian[i_jac[hh_pc]], NULL, NULL, NULL)
 
         cdef DMLabel c_label