Skip to content

docs: meta-scrub + research-tooling relocation + doc gaps (perfectionist pass PR-3)#614

Open
KedoKudo wants to merge 9 commits into
mainfrom
chore/meta-scrub-doc-gaps
Open

docs: meta-scrub + research-tooling relocation + doc gaps (perfectionist pass PR-3)#614
KedoKudo wants to merge 9 commits into
mainfrom
chore/meta-scrub-doc-gaps

Conversation

@KedoKudo

Copy link
Copy Markdown
Contributor

PR-3 — the final queue item of the pre-paper perfectionist pass (after #610 claim accuracy, #611 exact Doppler kernel, #612 MIT/metadata). The repository's prose now reads for a cold reviewer: no review-process metadata in production comments, no citations to artifacts a reader cannot open, no research-only tooling shipped in-tree, and the flagship API documented.

Net: −1592 lines (+707/−2299 across 48 files).

What changed

Meta scrub (~205 sites)

House rule: a comment says why the code is shaped this way; when / which PR / which reviewer lives in commit messages and PR bodies, where git preserves them.

Research tooling out of the repo

The 14 scripts/perf/* profiling drivers and 2 scripts/fixtures/extract_venus_*.py extractors exist only to read the private 38 GB VENUS cube; they now live with that data outside the repository (path depth, internal invocation paths, and SAMMY EMIN/EMAX wording updated; py_compile/bash -n verified; zero executable wiring referenced them). Public docs describe the provenance without naming paths nobody else has.

Doc gaps closed

  • spatial_map_typed — the pipeline's flagship spatial entry point had zero rustdoc; now documents input modes, the full up-front validation contract (per-domain value checks, active-bin scoping, the three rejected degenerate configs and why each would otherwise produce a silently all-NaN map), rayon parallelism, cancel/progress semantics, and the error variants.
  • lib.rs module lists for pipeline/io/fitting (9 modules were missing); descriptions taken from each module's own header. The hdf5-gated entries are deliberately not intra-doc links so standalone cargo doc -p nereids-io stays warning-free (verified).
  • 13 in-test println!s gone (11 pure dumps removed; the fixture-missing skip notice → eprintln!; the parser's summary prints → a real contract assertion). The deliberate measurement eprintln!s (samtry, VENUS, Doppler) untouched. Bare unreachable!() in the penetrability test helper now states its invariant.

Two real fixes the review process forced

  1. Cancellation contract (round 2): the new rustdoc promises PipelineError::Cancelled on every cancellation path, but the fit_temperature base-XS precompute wrapped its error with .map_err(PipelineError::Transmission), bypassing the purpose-built From conversion — a user cancelling during the expensive Reich-Moore precompute got an error toast instead of a clean cancel. Fixed with bare ? + a mutation-checked (3/3) regression test with engineered timing windows (caller-supplied precomputed XS removes the earlier window; a 100k-point grid stretches the target window to tens of ms; the assertion is timing-independent so the test cannot flake).
  2. Real-VENUS counts-KL regression gate (round 1): the scrub's reworded docs claimed real-data validation that no in-tree artifact substantiated. Instead of weakening the words, the new test_counts_kl_fit_matches_baseline runs the joint-Poisson fit on the committed aggregated VENUS Hf fixture: anchored density (2.911e-5) + deviance/dof (3.14e4 — the documented real-data saturation regime, asserted ≫ 1) + an anchor-independent physical bracket. Python suite 127 → 128.

Review

/review-pipeline, 3 cross-family rounds (Claude + Codex):

Round Outcome
1 1 VERIFIED P1 (my own rewrite misattributed the polish benchmark to "real VENUS data" — the scrub pass introducing a claim-accuracy bug, caught and fixed) + 1 NEEDS-VERIF P1 (resolved with the new real-data gate) + 7 P2s fixed
2 0 blocking; Codex's circularity P1 refuted (docs make two distinct claims — synthetic-validated vs regression-locked — and the test self-discloses its anchor role); 4 P2s fixed incl. the Cancelled-contract code fix
3 0 blocking; Codex's timing-test P1 refuted by direct measurement (verifier compiled a timing probe: 17/17 runs land in the changed call site, 25× debug margin); 4 P2s fixed incl. the standalone-doc-build breakage the reviewer empirically reproduced

Acceptance grep gates (all clean)

  • .research appears in no tracked file outside .gitignore, CHANGELOG history, and agent-process docs — including inside binary fixtures;
  • Codex|Copilot|codex04|memo 35|memo 38|EG[0-9]|round-N P[0-9]-class tokens: zero hits across crates/bindings/apps/tests;
  • scripts/perf|extract_venus: zero hits.

Gates

fmt · clippy -D warnings · cargo test --workspace (22 binaries, 0 failures; pipeline suite +2 tests) · cargo doc -D warnings (workspace and standalone nereids-io) · mdBook · maturin build · pixi run test-python (128 passed).

🤖 Generated with Claude Code

KedoKudo added 8 commits June 12, 2026 09:29
The 14 scripts/perf/* profiling drivers and the two
scripts/fixtures/extract_venus_*.py extractors exist only to read the
private 38 GB VENUS measurement cube under the gitignored research
archive — they cannot run for anyone without that data, and shipping
them in the repository points users at paths that do not exist for
them. They now live with the data they consume (outside the repo),
with their repo-root depth, internal invocation paths, and the
SAMMY EMIN/EMAX comment wording updated for the new location; all
verified with py_compile / bash -n. No executable wiring referenced
them (pixi tasks, CI, Cargo all checked).

Public docs that pointed at those scripts or at research-archive
paths now describe the provenance without the paths: the
tests/data/README regeneration note, the nexus.rs producer
attributions, the resolution.rs bit-exact-baseline comment, and the
PR template's audit-reference instructions.

Assisted-With: Claude Fable 5 (1M context) <noreply@anthropic.com>
House rule: a comment says why the code is shaped this way; when it
changed, in which PR, and who reviewed it live in the commit message
and PR body, where git log and blame preserve them. Production
comments had accumulated ~140 reviewer/round/PR decorations
('Codex round-3 P3 on PR #480', 'Copilot review finding on PR #470',
'Independent review P1 reproduction', 'epic #472, PR #475', …).

Every decoration is now gone; every carried fact stays inline
('Codex PR #475 round 2 measured 73 % rel err' → 'measured: 73 % rel
err'; bench-off outcomes, error magnitudes, and defect descriptions
all preserved). Issue numbers that anchor the defect or contract
being guarded (#458, #608, #514, #517, #483 A1, …) are kept — issues
are the durable what-record. Algorithm terms that merely contain
'round' (round-trip, round-off, round-robin) and SAMMY manual
revision citations are untouched.

Meta-only test docstrings ('Regression for Round-2 review fix #4'),
the codex04 study citations, and rustdoc .research/ paths are
rewritten separately — this commit is the mechanical tail strip.

Assisted-With: Claude Fable 5 (1M context) <noreply@anthropic.com>
…te results inline

Rendered rustdoc and production comments cited artifacts a reader
cannot open: gitignored .research/ paths (the algorithm-design
round-robin judgments, the counts-path governing-equations note, the
real-VENUS partial-GAL validation), private memo sections ('memo 35
§P2.1', 'EG5'), agent-session memory files
(feedback_synthetic_resolution_test_design,
feedback_bit_exact_oracle_verbatim), the codex04 contestant name, and
a 'byte-identical to main' branch reference. Every one is rewritten
to carry its content inline: the surrogate module now describes the
design study and keeps its measured compression table; joint_poisson
states the profiled-flux derivation and benchmark regimes in place;
the partial-GAL test explains why the real measurement cannot ship
as a fixture (ORNL release policy on the USR/FTS kernel) without
naming a directory nobody else has.

Meta-only test docstrings ('Regression for Round-2 review fix #4',
'Codex review: …', 'Round-1 audit flagged …') now state the pinned
invariant directly; issue anchors (#514, #486) stay. No behavioral
change — comments and docs only.

Assisted-With: Claude Fable 5 (1M context) <noreply@anthropic.com>
- spatial_map_typed — the pipeline's flagship spatial entry point —
  had zero rustdoc.  It now documents the three input modes, the
  full up-front validation contract (shape checks, live-pixel value
  domains, the three rejected degenerate configurations and why each
  would otherwise yield a silently all-NaN map), the rayon per-pixel
  parallelism, cancel/progress semantics, per-pixel-failure-as-NaN
  behavior, and the error variants.
- The pipeline / io / fitting crate-level module lists had drifted:
  calibration, error, export, nexus, project, rebin, spectrum,
  active_mask, and forward_model were missing.  Every one-line
  description is taken from the module's own header doc.  physics
  already matched.
- The bare unreachable!() in penetrability's old_phase test helper
  now states its l = 1..=4 coverage invariant.
- The 13 in-test println!s are gone: 11 were value dumps whose
  assertions already carry the checks; the Hf-177 fixture-missing
  skip notice becomes eprintln! (notices go to stderr, matching the
  suite's measurement-output convention); the parser's two summary
  prints become a real contract assertion (total_resonance_count
  must cover the resolved range's per-L-group sum).  The deliberate
  measurement eprintln!s (samtry, VENUS, Doppler) are untouched.

Assisted-With: Claude Fable 5 (1M context) <noreply@anthropic.com>
…comments

The acceptance grep for the meta scrub caught what the original
inventory's pattern missed: ~65 more references to private research
memos ('memo 35 §P1.2', 'memo 38 §6') and their benchmark shorthand
('EG2 S2 C_An', 'EG5', 'EG1') across the Python API docstrings
(__init__.pyi + the PyO3 rustdoc — user-visible on docs renders),
GUI hover text and comments, user-facing error message strings, and
the joint-Poisson/pipeline doc comments.

Same treatment as the rest of the scrub: every pointer to an
artifact a reader cannot open is replaced by the content it pointed
at ('memo 38 §6 — 17 min/pixel polish cost' → 'measured ~17
min/pixel polish cost'; 'EG2 S2 C_An → −23% density bias' →
'benchmarked at −23% density bias'), deferral labels become plain
deferrals, and two test assertions that matched on memo-section
substrings in error messages now match on the parameter names. One
test identifier carrying memo-section shorthand (p2_2) is renamed to
describe the rule it pins.

Assisted-With: Claude Fable 5 (1M context) <noreply@anthropic.com>
…ce corrections

The scrub's own rewrites introduced two claim-accuracy regressions,
both caught by cross-family review:

- nelder_mead.rs attributed the 1/20→10/20 polish rescue to 'real
  VENUS data'; that regime is the synthetic counts benchmark — on
  real VENUS counts D saturates at 10^4-10^5 and polish cannot even
  self-terminate (the reason it defaults off).  The doc now states
  both regimes explicitly.
- joint_poisson/pipeline/.pyi claimed the counts-KL path was
  'validated experimentally on synthetic and real VENUS counts data'
  while this branch deleted the only real-data KL driver, leaving
  nothing in-tree to substantiate the real-data half.  The proper
  fix: a new real-data regression gate
  (TestVenusMlbwRegression::test_counts_kl_fit_matches_baseline)
  runs the joint-Poisson fit on the committed aggregated VENUS Hf
  fixture and anchors density (2.911e-5) and deviance/dof (3.14e4 —
  the documented real-data saturation regime, asserted >> 1).  The
  wording now points at that test.  Python suite: 127 → 128 (+~40 s,
  the cost of a real-data fit).

P2s: dangling §P2.2 and 'memo 38' references resolved; state.rs
sentence break restored; spatial_map_typed doc gains the reachable
PipelineError::Transmission variant and the live-pixel progress
target; stale absolute pipeline.rs line citations replaced with
function-name anchors; 'duck-tape' → 'duct-tape' (×2).

Assisted-With: Claude Fable 5 (1M context) <noreply@anthropic.com>
…ture precompute

The new spatial_map_typed rustdoc promises every cancellation path
returns PipelineError::Cancelled, but the fit_temperature base-XS
precompute wrapped its error with .map_err(PipelineError::Transmission),
bypassing the From<TransmissionError> impl whose whole purpose is the
Cancelled→Cancelled mapping — a user cancelling during the expensive
Reich-Moore precompute got an error toast instead of a clean cancel.
Bare ? now routes through the conversion (the pre-existing mis-mapping
predates this branch; the new doc made it a contract).

Regression test with engineered windows: caller-supplied precomputed
broadened XS removes the earlier (already correctly mapped) precompute
window, and a 100k-point grid stretches the base-XS window to tens of
ms while the watcher flips cancel at 5 ms — mutation-checked 3/3
(restoring the map_err fails every run with Transmission(Cancelled));
the assertion itself is timing-independent, so the test cannot flake.

Doc P2s: the npz regression fixture gets its provenance section in
tests/data/README.md (the deleted extraction script was its only
record); the validation bullet now states the deliberate active-bin
scoping for transmission cubes; the counts-KL gate docstring carries
the LM gate's cross-backend tolerance rationale.

Assisted-With: Claude Fable 5 (1M context) <noreply@anthropic.com>
…recision

- The new nereids-io module list linked the hdf5-gated nexus/project
  modules, which breaks 'cargo doc -p nereids-io' under default
  features (-D broken_intra_doc_links) — the workspace doc job only
  passed via feature unification from apps/gui.  The two entries are
  plain code spans now; standalone doc build verified clean.
- The committed VENUS npz fixture embedded a description string
  pointing at the deleted extraction script and a research-archive
  path — the exact reference class this PR scrubs, baked into a
  shipped binary.  Regenerated with provenance wording matching
  tests/data/README.md; all six data arrays verified bitwise
  identical, and both real-data regression gates pass on the
  regenerated file (anchors untouched).
- spatial_map_typed's validation bullet now states each cube's actual
  domain (uncertainty is strictly positive, not merely non-negative).
- The counts-KL gate gains a coarse physical bracket
  (1e-5 < density < 1e-4) independent of the machine-generated
  anchor, so a future re-anchoring commit cannot silently absorb an
  order-of-magnitude solver regression.

Assisted-With: Claude Fable 5 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.75000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.63%. Comparing base (3c5f6ea) to head (47e97fb).

Files with missing lines Patch % Lines
crates/nereids-pipeline/src/spatial.rs 92.30% 4 Missing ⚠️
crates/nereids-physics/src/penetrability.rs 0.00% 1 Missing ⚠️
crates/nereids-physics/src/reich_moore.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #614      +/-   ##
==========================================
+ Coverage   91.59%   91.63%   +0.03%     
==========================================
  Files          46       46              
  Lines       32227    32283      +56     
==========================================
+ Hits        29518    29582      +64     
+ Misses       2709     2701       -8     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…t gate

Windows CI lost the watcher race in
test_spatial_map_mid_run_cancellation_returns_err: on the saturated
4-core runner the busy-spinning watcher thread was starved long
enough for all 64 pixels AND the post-loop cancel check to finish
before its store became visible — a COMPLETE Ok map, which the test
treated as failure even though that outcome carries no information
about the regression under test (a PARTIAL Ok map).  The test now
distinguishes the three outcomes: Err(Cancelled) passes, a complete
map retries (bounded, 5 attempts), a partial map — the actual
regression — fails immediately on any attempt.  The watcher yields
instead of spinning so it gets scheduled on saturated runners.

Codecov flagged the reworded 'B_det wiring is deferred' error string
as the one uncovered patch line in pipeline.rs — its gate (non-zero
detector_background nuisance arm under the joint-Poisson solver) had
no test at all.  Added the contract test alongside the sibling gate
tests.

Assisted-With: Claude Fable 5 (1M context) <noreply@anthropic.com>
@KedoKudo

Copy link
Copy Markdown
Contributor Author

Windows CI failure + codecov disposition (addressed in 47e97fb):

  • Test (windows-latest): the pre-existing test_spatial_map_mid_run_cancellation_returns_err lost its watcher race on the saturated runner — the busy-spinning watcher thread was starved until all 64 pixels and the post-loop cancel check completed, producing a COMPLETE Ok map (n_converged: 64/64), which the test wrongly treated as the regression. The test now distinguishes the three outcomes — Err(Cancelled) passes, a complete map is an inconclusive race-loss and retries (bounded), a partial map (the actual regression it guards) fails immediately — and the watcher yields instead of spinning.
  • Codecov patch (3 lines, status checks already green): the pipeline.rs miss was the reworded "B_det wiring is deferred" error string, whose gate (non-zero detector_background under joint-Poisson) had no test — a real contract test now covers it. The other two are unexecutable-by-design: the unreachable! message in penetrability's test helper (executes only if the invariant breaks) and the Hf-177 fixture-missing skip notice in reich_moore (executes only on broken checkouts).

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR performs a broad “meta scrub” and documentation hardening pass aimed at making the repository readable and self-contained for external reviewers, while relocating research-only tooling out of the tracked tree and adding/strengthening a couple of regression gates tied to committed VENUS fixtures.

Changes:

  • Remove review/round/PR/memo-style metadata from in-tree prose (rustdoc, tests, GUI strings, templates) while preserving the underlying technical facts.
  • Relocate/remove research-only VENUS extraction + performance profiling scripts from the repository; update fixture provenance docs accordingly.
  • Close key doc/testing gaps: add substantial rustdoc for spatial_map_typed, ensure cancellation maps to PipelineError::Cancelled, and add a real-VENUS counts-KL regression gate in Python tests.

Reviewed changes

Copilot reviewed 47 out of 48 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/test_nereids.py Scrubs review metadata in docstrings/comments and adds a real-VENUS counts-KL baseline regression test using committed fixtures.
tests/test_mcp.py Removes review-metadata phrasing from docstrings/comments in MCP-facing tests.
tests/data/README.md Updates VENUS fixture provenance/regeneration guidance without referencing private repo paths; documents the aggregated NPZ fixture.
scripts/python_api_allowlist.txt Removes private memo reference from API allowlist commentary.
scripts/perf/summarize_sample.py Removes research-only profiling helper script from the tracked tree.
scripts/perf/run_sample.sh Removes research-only macOS sampling wrapper script.
scripts/perf/run_sample_kl_periso_tzero.sh Removes research-only macOS sampling wrapper script.
scripts/perf/run_sample_b3.sh Removes research-only macOS sampling wrapper script.
scripts/perf/run_sample_b2.sh Removes research-only macOS sampling wrapper script.
scripts/perf/run_sample_b2_at_scale.sh Removes research-only macOS sampling wrapper script.
scripts/perf/run_sample_b1.sh Removes research-only macOS sampling wrapper script.
scripts/perf/profile_kl_periso_tzero.py Removes research-only profiling driver referencing private data paths.
scripts/perf/profile_b3_lm_periso_tzero.py Removes research-only profiling driver referencing private data paths.
scripts/perf/profile_b2_kl_grouped.py Removes research-only profiling driver referencing private data paths.
scripts/perf/profile_b2_kl_grouped_at_scale.py Removes research-only profiling driver referencing private data paths.
scripts/perf/profile_b1_lm_tzero_grouped.py Removes research-only profiling driver referencing private data paths.
scripts/perf/profile_a1_lm_grouped.py Removes research-only profiling driver referencing private data paths.
scripts/perf/baseline_dump.py Removes research-only baseline dump/verify harness from the tracked tree.
scripts/fixtures/extract_venus_hf_nexus.py Removes fixture extractor that depended on private VENUS cube; tooling moved out-of-repo.
scripts/fixtures/extract_venus_aggregated.py Removes aggregated VENUS extractor that depended on private VENUS cube; tooling moved out-of-repo.
crates/nereids-pipeline/src/spatial.rs Adds comprehensive rustdoc for spatial_map_typed, fixes cancellation error mapping in the temperature precompute path, and adds/updates cancellation regression tests.
crates/nereids-pipeline/src/pipeline.rs Updates joint-Poisson / counts-KL documentation and error messages to remove private memo anchors; adds a deferred-feature rejection test for non-zero detector background.
crates/nereids-pipeline/src/lib.rs Completes module list in crate-level rustdoc for pipeline crate.
crates/nereids-physics/tests/venus_usr_surrogate.rs Rewords design-study provenance text to remove contestant naming while keeping the technical claim.
crates/nereids-physics/src/transmission.rs Scrubs review metadata from doc comments in the transmission module.
crates/nereids-physics/src/surrogate.rs Generalizes design-study provenance wording and removes .research/ references while preserving math/measurement content.
crates/nereids-physics/src/slbw.rs Removes debug println! noise from tests.
crates/nereids-physics/src/resolution.rs Scrubs review-metadata phrasing from safety/docs/tests while preserving invariants and rationale.
crates/nereids-physics/src/reich_moore.rs Removes debug println!s from tests and changes a skip notice from stdout to stderr.
crates/nereids-physics/src/penetrability.rs Replaces bare unreachable!() with an invariant message in a test helper.
crates/nereids-io/src/project.rs Removes memo references from project snapshot field docs.
crates/nereids-io/src/nexus.rs Updates rustdoc to avoid referencing removed in-repo extractors while preserving the units/back-compat contract description.
crates/nereids-io/src/lib.rs Expands module list documentation (including non-link entries for hdf5-gated modules to keep standalone doc builds warning-free).
crates/nereids-fitting/src/transmission_model.rs Scrubs PR/review references and tightens explanatory comments around surrogate dispatch guards.
crates/nereids-fitting/src/poisson.rs Updates module docs to remove memo references while retaining the distinction between legacy helpers and production counts-KL path.
crates/nereids-fitting/src/nelder_mead.rs Rewrites motivation docs to remove private benchmark shorthand while preserving the performance/correctness rationale.
crates/nereids-fitting/src/lm.rs Cleans up regression-test doc comments to remove review-round metadata.
crates/nereids-fitting/src/lib.rs Completes crate-level module list documentation.
crates/nereids-fitting/src/joint_poisson.rs Removes .research//memo references from module/docs and clarifies the validation/regression-gate narrative.
crates/nereids-endf/src/parser.rs Replaces test println! output with an assertion that checks an invariant (total resonance count covers resolved sum).
bindings/python/src/lib.rs Scrubs memo references in PyO3-facing doc comments and keeps the binding-side validation narrative intact.
bindings/python/python/nereids/init.pyi Updates stub docstrings to remove memo anchors and reflect the real-VENUS regression gate claim.
apps/gui/src/widgets/design.rs Updates GUI overlay rationale comment to avoid memo references while pointing to the underlying pipeline behavior.
apps/gui/src/state.rs Removes memo references from persisted GUI state field docs.
apps/gui/src/guided/result_widgets.rs Rewords GOF labeling comment to remove memo anchor.
apps/gui/src/guided/analyze.rs Cleans hover/help text and result-label comment to remove memo references while preserving the user-facing meaning.
.github/pull_request_template.md Removes .research/ path mention and rephrases process guidance without review-metadata jargon.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Phase 0: Expand nereids-core with full type/trait/error system

2 participants