Flexible ranks: per-field coherence read-outcome diagnostic (miss rate / why a bounds cylinder reports infrequently)

## Summary

Add an always-on, per-field **coherence read-outcome diagnostic** to the
unequal-rank multi-source reader, so that when a bounds cylinder appears to
report infrequently we can tell *why* — a coherence problem (reads being
rejected) vs. a slow upstream sender (no new data) — and quantify how often
multi-source reads actually straddle a publish.

Follow-on to the flexible (unequal) rank assignments work and its per-field
coherence policy (see `doc/designs/flexible_rank_assignments.md` §Coherence;
strict-coherence decision for `BEST_XHAT`/`RECENT_XHATS` in #741, `DUALS` in
the merged phase-3a work).

## Motivation

Under unequal ranks, a cylinder assembles each per-scenario field from several
peer ranks via the overlap map. The per-field coherence policy decides what
happens when those sources disagree on `write_id`:

- **strict** (`DUALS`, `BEST_XHAT`, `RECENT_XHATS`) → the read is **rejected and
  retried**;
- **relaxed** (`NONANTS_VALS`, `XFEAS`, ...) → the read is **accepted but
  blended** across iterations.

A bounds cylinder computes its bound from a field it *reads* (the Lagrangian
spoke reads `DUALS`; FWPH reads `BEST_XHAT`/`RECENT_XHATS`). If those reads keep
getting rejected, the cylinder rarely gets fresh input and so reports a new
bound infrequently. Today that symptom is indistinguishable from "the upstream
sender is just slow." This diagnostic separates the two.

It also empirically answers the open design question behind the strict-coherence
choice: *how often does a multi-source read actually straddle a publish?* If a
strict field's rejection rate is negligible, strict coherence is effectively
free; if it climbs (e.g. under an asynchronous APH sender), we learn it here
rather than in the field.

## What to measure

Per **field**, per **reader cylinder** (counters live on each `SPCommunicator`,
so attribution to a specific spoke+field is automatic), accumulate over
multi-source reads (reads with >= 2 sources; single-source reads can't miss):

- `total` — multi-source reads
- `not_new` — coherent, but `write_id` did not advance (sender hasn't published)
- `new_accepted` — advanced + coherent → used
- `rejected_incoherent` — sources disagreed → strict reject (strict fields only)
- `accepted_mixed` — sources disagreed but accepted (relaxed fields only)

Derived: **coherence miss rate** = `(rejected_incoherent + accepted_mixed) /
total`. Diagnosis: `rejected_incoherent` dominating ⇒ coherence; `not_new`
dominating ⇒ slow sender.

(Optionally also distinguish, for strict reads, a rejection caused by *this
rank's* sources disagreeing vs. the cross-reader `_write_ids_agree` collective
rejecting — the former is the fundamental coherence miss.)

## Where to hook

The single choke point is `reduce_source_write_ids(source_ids, strict)` and its
call site in `SPCommunicator._flex_get_multi_source` — it already has
`source_ids` and computes agreement. Keep `reduce_source_write_ids` pure; do the
counting at the call site (where `self` holds the counters). Cost is ~two integer
increments per multi-source read, so counting is always-on.

This instruments the **consumer's input reads**, not the bound scalars: the
bound fields (`OBJECTIVE_INNER_BOUND` / `OBJECTIVE_OUTER_BOUND`) are single-source
and never hit the coherence path.

## Reporting

- **Accumulate always** (negligible overhead), so a misbehaving run can be
  inspected without having pre-armed it.
- **Finalize summary**: concise per-field breakdown, rank-0-gated, printed only
  for fields that did multi-source reads. Aggregate across the reader cylinder's
  ranks with one MPI reduction at report time (not per read).
- **Opt-in periodic line** (flag, e.g. every N iterations) for live debugging.
- Expose the counters as an attribute for programmatic/test access.
- Inert at equal ranks (no multi-source reads) — a pure flex-path diagnostic.

## Suggested placement

A small follow-up branch on top of the flex stack, paired with the APH
verification phase (currently DLWoodruff/mpi-sppy-1#17 / phase 5): the async
sender is the one place misses actually occur, so its integration test is the
natural test bed, and the counter doubles as that phase's verification.

## Related

- Per-field coherence policy & strict-coherence decision: #741,
  `doc/designs/flexible_rank_assignments.md` §Coherence
- CG multi-rank deadlock fix that enabled the XFEAS path: #737 (closed #729)
- CG-hub XFEAS coverage / `--ph-xfeas-spoke-rank-ratio`: #730 (closed by #741)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flexible ranks: per-field coherence read-outcome diagnostic (miss rate / why a bounds cylinder reports infrequently) #742

Summary

Motivation

What to measure

Where to hook

Reporting

Suggested placement

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Flexible ranks: per-field coherence read-outcome diagnostic (miss rate / why a bounds cylinder reports infrequently) #742

Description

Summary

Motivation

What to measure

Where to hook

Reporting

Suggested placement

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions