Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
223 changes: 223 additions & 0 deletions docs/developer/design/CONSTRAINED_FREESLIP_MULTIPLIER.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,229 @@ now. A true *co-dimension-1* `Ξ»` (boundary-only DOFs end-to-end) remains the
honest long-term form; the full-domain field + boundary-only reduction is the
pragmatic path that ships today.

## Conditioning, the augmentation's true role, and rigid-body null spaces

This section records what a focused study of strong **boundary viscosity
contrast** (annulus lateral `ΞΌ_hi β‰₯ 10Β³`, SolCx's `1e6` jump) revealed about
*why* the constraint block is sometimes hard to solve β€” and corrects an
over-simplification in the "augmentation is purely a speed knob" framing above.

### The constraint Schur preconditioner (`selfp`, the shipped default)

The grouped `[p, Ξ»]` block is preconditioned through its Schur complement. With
the base-Stokes default `pc_fieldsplit_schur_precondition = a11`, that
preconditioner is the assembled `a11` block β€” a viscosity-scaled **mass** on
pressure (`1/ΞΌ`) and the small screening on `Ξ»`. The pressure part is the right
scaling (`S_p β‰ˆ μ⁻¹ M_p`); the **`Ξ»` part is not** β€” the true constraint Schur is

$$S_\lambda = C\,A^{-1}C^{\mathsf T},$$

a boundary operator that scales like `1/ΞΌ` but is *not* a simple `Ξ»`-mass. Under
strong contrast the bare `a11` mass is a poor approximation of `S_Ξ»` and the
solve walls (or needs a very large augmentation to compensate).

`SNES_Stokes_Constrained` therefore defaults to
**`pc_fieldsplit_schur_precondition = selfp`** instead. `selfp` forms

$$S \approx A_{11} - A_{10}\,\operatorname{diag}(A_{00})^{-1}A_{01}$$

from the *actual* operator blocks; its `λλ` corner is
`βˆ’C diag(A)⁻¹ Cα΅€`, i.e. the true constraint Schur, **automatically and at no
extra assembly**. On a smooth lateral viscosity ramp the outer Krylov count is
flat (β‰ˆ2–6 iterations) across `ΞΌ_hi = 1 … 10⁢`, where the bare `a11` mass climbs
to ~25 and diverges. Override with
`solver.petsc_options["pc_fieldsplit_schur_precondition"] = "a11"` if needed.

**Why `selfp` and the `r ∝ μ` augmentation are mutually consistent.** The bare
constraint Schur is `Sβ‚€ = C A⁻¹ Cα΅€ ~ 1/ΞΌ`. The augmentation adds `rΒ·N` to `A`
(with `N ~ Cα΅€C`, the boundary `nβŠ—n`), so by Woodbury the *augmented* Schur is
`S_r = C(A + rN)⁻¹Cα΅€ β‰ˆ Sβ‚€(I + r Sβ‚€)⁻¹` β€” equal to `Sβ‚€ ~ 1/ΞΌ` for small `r` and
tending to `1/r` for large `r`, with the **crossover at `rΒ·Sβ‚€ ~ 1`, i.e. `r ~ ΞΌ`**.
So the default `r = augmentation_base Β· ΞΌ(x)` is the AL-natural (crossover)
scaling. And `selfp` builds its Schur from `diag(A + rN)`, whose boundary diagonal
is `ΞΌ + r = ΞΌ(1 + augmentation_base)` β€” so `selfp`'s `λλ` block `~ 1/r`
**automatically tracks the augmented Schur, because `diag(A)` already contains the
penalty.** This is *why* they compose so well: the approximate inverse "sees" the
augmentation. A **constant** `r` (instead of `∝ μ`) was tested and is strictly
worse β€” it is negligible on the stiff side (`r β‰ͺ ΞΌ`, no regularisation) *and*
breaks the uniform `diag(A)` scaling that `selfp` relies on, giving garbage
velocity at SolCx 1e6 (`velerr 1–12` across `r = 10²–10⁴`, vs `1.3e-4` for
`r ∝ μ`). The `μ`-weighting is load-bearing; keep it.

### The augmentation has two roles β€” conditioning *and* null-space regularisation

The earlier sections describe `r` as a *speed knob* that conditions the `[p, Ξ»]`
Schur complement. That is correct, but incomplete. `r(nβŠ—n)` does **two**
independent things:

1. **Conditions `S_Ξ»` from the velocity (A) side.** Stiffening the boundary
velocity is one way to make the constraint Schur well-behaved. `selfp`
conditions the *same* operator from the Schur side. They are **substitutes**:
on a viscosity-contrast sweep, increasing `r` *or* switching to `selfp` both
collapse the iteration count. With `selfp` the augmentation can drop to zero on
a **single-constraint** problem and the solve still converges.

2. **Regularises the velocity block's rigid-body null space** β€” a *structural*
role that `selfp` alone cannot fill (see below). This is why an all-free-slip
enclosed box still needs `r > 0` even with `selfp`.

So `r` is not "just an accelerator". On problems with a velocity anchor (a no-slip
patch) it is optional with `selfp`; on problems with **no** velocity anchor it is
doing genuine null-space regularisation.

### Rigid-body null spaces: a general principle (not specific to constraints)

The velocity operator `A = ∫ 2μ Ρ(u):Ρ(v)` has the **zero-strain rigid-body
motions in its kernel for *any* mesh** β€” **2 translations + 1 rotation in 2D, 3 +
3 in 3D** β€” removed *only* by Dirichlet velocity DOFs. In a monolithic direct
solve this is hidden, but a **Schur-factored** solve inverts `A` as an *inner*
block that sees only `A` (not the constraint rows or the pressure), so the
singularity surfaces there.

```{important}
Whenever **nothing pins velocity anywhere** β€” all free-slip, all-natural, the
free-slip spherical shell β€” the inner `A`-solve is singular along the rigid-body
modes. The constraints kill those modes in the *full* system but never in the
*inner* `A`-solve, so it can drift before the outer coupling acts. In the
**block-constrained** solver the working remedy is the **augmentation**
`r(nβŠ—n)`: it stiffens those modes out of `A` directly, so the inner solve is
well-posed. This is a *third*, structural role of the augmentation (beyond
conditioning the Schur complement), and it is why an all-free-slip enclosed
problem needs `r > 0` even with `selfp`.
```

```{warning}
**Do not** try to remove the inner-`A` singularity by attaching the rigid-body
modes to the *coupled* null space (`petsc_velocity_nullspace_basis`) on a
**constrained** problem β€” neither the rotation nor the translations work cleanly:

- The **rotation** carries a *real* part of the answer: a closed `uΒ·n=0`
incompressible flow generically has nonzero circulation `∫(x u_y βˆ’ y u_x) dV`,
so it is not orthogonal to the rotation mode. Projecting it out corrupts the
velocity (measured: SolCx velerr `3e-1`; with all three modes, `β‰ˆ1.0`).
- The **translations** *are* orthogonal to the solution in **L2**
(`∫u_i dV = 0` for incompressible `u·n=0` on a closed boundary), so in principle
they are harmless to suppress. But PETSc projects against the **Euclidean nodal**
inner product, and the unweighted nodal sum `Ξ£ u_i` is not zero even when the
integral is β€” so projecting the constant mode still injects error (SolCx velerr
`1e-4 β†’ 1.4e-2`). Honouring the L2 orthogonality needs **mass-weighted** null
vectors.

Attaching the modes to the **velocity sub-block** instead (the inner solve) was
prototyped and is *less wrong* than the coupled route β€” at `aug=10⁴` it degrades
SolCx velocity to `4e-3` rather than destroying it (`β‰ˆ1.0`) β€” but it still does
not match the augmentation (`1.3e-4`) and does not enable small/zero `aug`. The
reason is that `selfp` builds its Schur preconditioner from `diag(A)`, which is
**not** rank-deficient even when `A` is, so the preconditioner is blind to the
sub-block null space. A clean version would need the null space attached natively
(`DMSetNullSpaceConstructor`, before setup) *and* a Schur PC that respects it β€”
an unproven, non-trivial enhancement. **In practice the augmentation is the
working remedy for unanchored free-slip** (it stiffens the modes out of `A`
directly, with no pseudo-inverse or inner-product subtleties). Note
`petsc_velocity_nullspace_basis` remains correct for a genuine *coupled* null
mode, e.g. the rigid **rotation** of a free-slip spherical shell solved *without*
a constraint multiplier.
```

The buoyancy-driven annulus escapes the singularity entirely because its
**no-slip inner boundary** already pins all rigid-body modes in `A`; there `aug`
is fully optional (aug=0 and aug=10⁴ velocities agree to `3e-4`) and small `aug`
gives clean velocity, pressure, *and* topography at once.

One further consequence:

- The bare velocity rigid-body modes are the **GAMG near-null space** of the
velocity block. Set as a *near*-null space on that sub-block
(`MatSetNearNullSpace`, distinct from the coupled-operator null-space projection
warned against above) they improve the velocity multigrid coarse spaces β€”
standard elasticity/Stokes-multigrid practice β€” independent of the singularity
question. This is the same sub-block plumbing the clean inner-`A` fix would use.

### Solver type at extreme contrast

A fixed-viscosity Stokes solve is **linear**. The default `newtonls` performs an
inexact-Newton defect correction; when the Schur approximation is stiff (extreme
contrast) it can take many "Newton" steps or stall, even though each linear solve
is cheap. Using `snes_type = "ksponly"` solves the linear system directly and is
markedly more robust (and faster) for constrained free-slip at strong contrast.

### Topography: the mixed / penalty / augmented triad

The boundary constraint mirrors the **incompressibility** triad, with `Ξ»` playing
the role of `p` (`Ξ» : uΒ·n=g :: p : βˆ‡Β·u=0`):

| formulation | analogue | topography signal |
|---|---|---|
| **mixed** (`r = 0`, multiplier) | Taylor–Hood `p` (a real DOF) | `Ξ»` *is* `βˆ’Οƒ_nn`, directly |
| **penalty** (no `Ξ»`, large `r`) | `p = βˆ’Ξ»_pen βˆ‡Β·u` | `βˆ’r(uΒ·n)` (recovered from the residual) |
| **augmented** (`Ξ» + r`) | Uzawa / ALG2 | `βˆ’Οƒ_nn = Ξ» + r(uΒ·n)` β€” a **mix** |

The Ξ΄u equation gives the boundary traction `βˆ’(Ξ» + r(uΒ·n βˆ’ g))`, so the recovered
topography splits between the multiplier `Ξ»` and the penalty term `r(uΒ·n)`. The
constraint residual `uΒ·n β‰ˆ` (FE/KSP tolerance) is roughly `r`-independent, so the
penalty share `r(uΒ·n)` grows **linearly with `r`** (measured: β‰ˆ`10⁻⁴`, `10⁻²`,
`1.0` of the signal at `r/μ = 1, 10², 10⁴`). At small `r`, `λ` *is* the clean
topography (`corr(Ξ», βˆ’Οƒ_nn) = 0.99995`); at large `r` it inflates and the
correlation degrades. Crucially the penalty term is pointwise **noise** (`r Γ—`
the noisy residual), so adding it back does **not** recover a cleaner signal β€”
the fix is to use a **smaller `r`**.

This used to be a trade-off (large `r` for conditioning vs small `r` for clean
topography). **`selfp` breaks it**: it conditions the constraint Schur from the
operator side, so a small (or, with a velocity anchor, zero) augmentation gives
both a well-conditioned solve *and* a clean multiplier. For topography work, keep
a *modest* `r` for constraint *tightness* (`uΒ·n β†’ 0`), not for conditioning.

### Is there a workable-AL / clean-topography overlap? (the unanchored case)

When the velocity is **unanchored** (all free-slip, no Dirichlet) the augmentation
is *required* for rigid-body regularisation, so it has a **floor**; topography
contamination gives it a **ceiling**. The two move oppositely with the boundary
viscosity contrast `ΞΌ`:

- the **regularisation floor** *rises* with `ΞΌ` (a stiffer inner-`A` needs more
`r(nβŠ—n)` to lift the rigid modes) β€” `aug ≲ 1` at `ΞΌ=10Β²`, `aug ~ 10Β³` at `ΞΌ=10⁢`;
- the **topography ceiling** *falls* like `1/ΞΌ` (the penalty noise is
`rΒ·(uΒ·n) = augΒ·ΞΌΒ·Ξ΅`).

Measured on the SolCx 4-wall (unanchored), velocity-vs-analytic and
`corr(Ξ», βˆ’Οƒ_nn)` together:

| regime | overlap |
|---|---|
| **anchored** (no-slip core/base, any `μ`) | floor = 0; `aug→0` gives clean `u`, `p`, *and* topography — overlap is everything |
| **unanchored, moderate `ΞΌ` (≲10⁴–10⁡)** | **wide** β€” the whole `aug ∈ [1, 10⁴]` range is accurate *and* `corr(Ξ»,CBF)=1.0000` (`ΞΌ=10Β²`) |
| **unanchored, extreme `ΞΌ` (~10⁢)** | **none** β€” regularisation needs `augβ‰₯10Β³`, where topography has collapsed (`corr 0.7–0.9`); the seemingly-clean low-`aug` corr is spurious (garbage `u` ↔ garbage CBF) |

So the conflict appears only in the corner of *fully-unanchored* geometry **and**
`~10⁢` contrast. There, lean on the block solver's exact constraint enforcement
(`RMS(u·n) ~ 10⁻⁹`) and recover topography from the consistent-boundary-flux
stress `βˆ’n·σ·n` (no `r`-amplification), not from `Ξ»` directly.

### The same story for the interior incompressibility penalty

The boundary constraint `uΒ·n=g` (multiplier `Ξ»`) is the surface analogue of the
interior constraint `βˆ‡Β·u=0` (multiplier `p`). UW3's Stokes carries an **optional
augmented-Lagrangian grad-div penalty** for incompressibility, `Ξ» ∫ ΞΌ (βˆ‡Β·u)(βˆ‡Β·v)`,
on by setting `solver.penalty` (default `0`). Everything above transfers:

- **Scaling.** The penalty is multiplied by the local viscosity `ΞΌ`
(`constitutive_model.K`) β€” exactly as the boundary AL is `∝ ΞΌ` β€” so the ratio
penalty/`ΞΌ` stays uniform. The `penalty` parameter is therefore a
*dimensionless* `O(1)` base. A bare constant (the previous behaviour) over-
stiffens low-`ΞΌ` regions into velocity locking under contrast (measured: a
constant `100` gave SolCx velocity error `0.1`, while the `ΞΌ`-scaled `O(1)`
penalty gives `~10⁻⁡`).
- **Pressure correction.** Because the penalty sits in the *operator*, the
recovered `p` is the multiplier, not the mechanical pressure:
`p_mech = p βˆ’ penaltyΒ·ΞΌΒ·(βˆ‡Β·u)`. At convergence they agree; pointwise they differ
by β‰ˆ a couple of percent of `|p|` at `penalty = O(1)`. For a pressure-dependent
constitutive law use `p_mech`; for visualisation, raw `p` is adequate.
- **Usually unnecessary.** The `1/ΞΌ` Schur preconditioner already conditions the
pressure block (outer KSP is unchanged with or without the penalty), so the
default `penalty = 0` β€” where `p` is the clean physical pressure and needs no
correction β€” is the right starting point.

## Files

- `src/underworld3/systems/solvers.py` β€” `SNES_Stokes_Constrained`, `_BlockConstraintBC`.
Expand Down
67 changes: 66 additions & 1 deletion src/underworld3/cython/petsc_generic_snes_solvers.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -4156,6 +4156,24 @@ class SNES_Stokes_SaddlePt(SolverBaseClass):
self._multipliers = []
self._multiplier_screening = []
self._block_constraint_bcs = []
# Give the Lagrange-multiplier (lambda) block its own viscosity-scaled
# Schur preconditioner. The constraint Schur complement S_lambda = C A^-1 C^T
# scales as 1/mu (since A ~ mu K), exactly like the pressure Schur S_p ~ mu^-1 M_p
# and independent of the augmentation r. When True, the lambda block's
# PRECONDITIONER mass (Pmat only) uses 1/mu (= saddle_preconditioner) while the
# true operator (Amat) (lambda,lambda) block stays = eps (exact Newton). This is
# the boundary-trace analog of _pp_G0 = 1/mu on pressure, and decouples
# convergence from r so r can shrink -> cleaner recovered lambda (= topography).
# When on, the lambda block reuses pressure's already-compiled _pp_G0 = 1/mu
# for its preconditioner block (same scaling, no extra JIT term). On uniform mu
# (fieldsplit) it is bit-identical; on moderate contrast it cracks the wall and
# makes the augmentation optional. Default OFF (opt-in): a monolithic lu solve
# factorizes the Pmat (pc_use_amat is a no-op here), so this Pmat term is NOT
# inert for direct solves β€” flipping the default on regresses the direct-lu
# constraint enforcement. See the design note
# docs/developer/design/CONSTRAINED_FREESLIP_MULTIPLIER.md (the warning on the
# coupled vs sub-block preconditioner placement).
self._multiplier_schur_pc = False
# Pin interior (off-boundary) multiplier DOFs to 0 so the solved [p,h]
# block carries only the boundary trace (~√ndof instead of ~ndof/3 DOFs);
# the boundary trace is the only physical part. Done by constraining the
Expand Down Expand Up @@ -4802,8 +4820,48 @@ class SNES_Stokes_SaddlePt(SolverBaseClass):
def petsc_use_pressure_nullspace(self, value):
self._petsc_use_pressure_nullspace = bool(value)
self._reset_stokes_nullspace()
# Force re-setup so the changed nullspace is re-registered with PETSc if
# this is toggled after the solver has already been set up (matches
# `petsc_use_nullspace`; `_attach_stokes_nullspace` runs inside the
# is_setup-gated `_setup_solver`).
self.is_setup = False

@property
def multiplier_schur_pc(self):
"""
Give each Lagrange-multiplier (constraint) block its own viscosity-scaled
Schur preconditioner.

The constraint Schur complement ``S_lambda = C A^-1 C^T`` scales as
``1/mu`` (since ``A ~ mu K``), exactly like the pressure Schur
``S_p ~ mu^-1 M_p`` and **independent of the augmentation** ``r``. When
enabled, the multiplier block's *preconditioner* (Pmat) diagonal reuses
pressure's ``1/mu`` mass while the true operator (Amat) block stays the
screening ``eps`` β€” so Newton is unchanged and this is a pure
preconditioner term (the boundary-trace analog of the pressure
``saddle_preconditioner``).

Effect: convergence decouples from ``r``. On moderate boundary viscosity
contrast (e.g. annulus mu_hi ~ 1e3) the augmentation becomes optional
(``augmentation=0`` converges) and the recovered multiplier (= dynamic
topography) is cleaner at small ``r``. On extreme contrast (eta jump 1e6)
a small augmentation floor is still needed, but the requirement shrinks by
orders of magnitude. Pair with ``snes_type='ksponly'`` for linear Stokes β€”
the block solver's default ``newtonls`` defect-corrects a linear system in
many steps when the Schur approximation is stiff.

Default ``False`` (opt-in). On the fieldsplit/iterative path it is
bit-identical on uniform ``mu`` and cracks the moderate-contrast wall;
but a monolithic ``lu`` solve factorizes the Pmat (``pc_use_amat`` is a
no-op there), so this term is not inert for direct solves β€” hence opt-in.
"""
return self._multiplier_schur_pc

@multiplier_schur_pc.setter
def multiplier_schur_pc(self, value):
self._multiplier_schur_pc = bool(value)
self.is_setup = False # force DS re-registration on next solve

@property
def petsc_velocity_nullspace_basis(self):
"""
Expand Down Expand Up @@ -5938,9 +5996,16 @@ class SNES_Stokes_SaddlePt(SolverBaseClass):
# diagonal mass Jacobian/preconditioner on each multiplier's field.
for k, mvar in enumerate(self._multipliers):
fid = mvar._solver_field_id
# Operator (Amat) (lambda,lambda) block is ALWAYS the true screening eps so
# Newton stays exact; only the preconditioner (Pmat) block swaps to the 1/mu
# Schur mass when _multiplier_schur_pc is on (pure-Pmat, can't corrupt Newton β€”
# the exact analog of pressure's _pp_G0 living only in JacobianPreconditioner).
# Reuse pressure's already-compiled _pp_G0 (= 1/mu, same scaling) so there is
# NO extra JIT term (the 1/mu Piecewise of SolCx is otherwise compiled twice).
hh_pc = self._pp_G0 if self._multiplier_schur_pc else self._hh_G0[k]
PetscDSSetResidual(ds.ds, fid, ext.fns_residual[i_res[self._h_F0[k]]], NULL)
PetscDSSetJacobian( ds.ds, fid, fid, ext.fns_jacobian[i_jac[self._hh_G0[k]]], NULL, NULL, NULL)
PetscDSSetJacobianPreconditioner(ds.ds, fid, fid, ext.fns_jacobian[i_jac[self._hh_G0[k]]], NULL, NULL, NULL)
PetscDSSetJacobianPreconditioner(ds.ds, fid, fid, ext.fns_jacobian[i_jac[hh_pc]], NULL, NULL, NULL)

cdef DMLabel c_label

Expand Down
Loading
Loading