Skip to content

Close DWI parity family + fix TSV BOM bug (52→58 codes)#3

Merged
neurolabusc merged 7 commits into
mainfrom
crafted-fixtures-dwi-parity
Jun 22, 2026
Merged

Close DWI parity family + fix TSV BOM bug (52→58 codes)#3
neurolabusc merged 7 commits into
mainfrom
crafted-fixtures-dwi-parity

Conversation

@neurolabusc

Copy link
Copy Markdown
Contributor

A step-change toward Deno parity, plus a real correctness fix surfaced along the way.

Summary

Implemented-code coverage 52 → 58 (tests/parity/ISSUE_COVERAGE.md), one parser bug fixed, and 7 modality datasets added to the local parity sweep.

Correctness fix

  • UTF-8 BOM in TSV parsing. parse_tsv now strips a single leading BOM, matching Deno's TextDecoder. Without it the first header read as "<name>", so the expected column (participant_id/onset/filename/name) was reported both MISSING and as an undefined extra column — 48 spurious issues on a BOM-prefixed dataset. Affects any TSV exported from Excel/Windows tooling. Unit-tested.

New parity codes (verified 1:1 vs @bids/validator 2.4.1)

  • UNKNOWN_BIDS_VERSION (from ds114).
  • DWI rules.checks.dwi.* family: VOLUME_COUNT_MISMATCH, BVAL_MULTIPLE_ROWS, BVEC_NUMBER_ROWS, DWI_MISSING_BVAL, DWI_MISSING_BVEC. These were "fixture-only" — the generic check engine + NIfTI/bval/bvec context already implemented them, but bids-examples ships empty placeholder .nii.gz so nifti_header was null and the checks never ran.

Durable tests

  • crates/bids-core/tests/dwi_checks.rs — synthetic DWI datasets (valid 4D NIfTI header + crafted bval/bvec) asserting each code fires at the right location. CI-gated, no Deno/external data.
  • crates/bids-core/tests/common/mod.rs — shared fixture builder (avoids a 4th copy of the per-test builder).

Parity-suite coverage (gitignored local fetch)

ds114 (DWI), emg_Multimodal, fnirs_tapping, eeg_rest_fmri, asl001, pet003 — first DWI/EMG/NIRS/ASL/PET coverage.

Method note

The fetch-only harvest (valid example datasets) mostly yields bug-fixes + coverage, not new codes — valid data doesn't trip error checks, and empty placeholder NIfTI keeps header-dependent checks dormant. Header-dependent codes are closed via crafted synthetic fixtures verified against Deno, then committed as CI-gated tests. See CLAUDE.md "Open follow-ups" for the next families.

Validation

cargo fmt, clippy -D warnings, cargo test --workspace all green. Parity sweep green on all present datasets (pre-existing ds005016 gap unchanged).

🤖 Generated with Claude Code

neurolabusc and others added 7 commits June 22, 2026 09:58
First iteration of the parity-gap loop, targeting DWI.

- Add ds114 (raw DWI: inherited dwi.bval/dwi.bvec + per-subject dwi.nii.gz)
  to scripts/fetch_bench_data.py and tests/parity/run.py. Gitignored local
  fetch, same model as the electrophysiology datasets.
- Promote UNKNOWN_BIDS_VERSION into the RUST_CODES contract: Rust already
  emits it correctly and it matches Deno 1:1 on ds114 (an unknown BIDSVersion
  in dataset_description.json). 52 -> 53 implemented; ISSUE_COVERAGE.md updated.

Finding (recorded in CLAUDE.md): bids-examples ships empty placeholder
.nii.gz, so NIfTI-header-dependent checks (volume counts, dims,
*_NOT_MATCHING_NIFTI) stay dormant -- Deno emits NIFTI_HEADER_UNREADABLE
(140x here) where Rust reads nothing. Closing those needs fixtures with
valid tiny NIfTI headers (pet002 approach), tuned per check -- not bulk
fetching. ds114 alone closed only UNKNOWN_BIDS_VERSION.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harvest loop, batch 1. The new datasets surfaced a real parity bug rather
than new codes (valid example datasets don't trip error codes):

- parse_tsv now strips a single leading UTF-8 BOM, matching Deno's
  TextDecoder. Without it the first header read as "\u{feff}<name>", so the
  expected column (participant_id/onset/filename/name) was reported both
  MISSING and as an undefined extra column — 48 spurious issues on
  fnirs_tapping, now gone. Added a unit test.
- Add emg_Multimodal, fnirs_tapping, eeg_rest_fmri to the parity suite
  (gitignored local fetch). All green; extra regression coverage for
  EMG/NIRS/EEG channel + events handling.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harvest loop, batch 2. Both green; no new codes and no divergences (valid
example datasets don't trip error codes). Kept for modality-breadth
regression coverage — ASL/perf and PET were previously uncovered by the
parity suite. Gitignored local fetch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The DWI volume/bval/bvec checks were "fixture-only": the generic check
engine and the NIfTI/bval/bvec eval context already implement them, but no
parity dataset exercised them (bids-examples ships empty placeholder
.nii.gz, so nifti_header is null and the volume checks never run).

- Add crates/bids-core/tests/dwi_checks.rs: synthetic DWI datasets (valid
  4D NIfTI header + crafted bval/bvec) that assert each code fires with the
  correct location. CI-gated, no Deno/external data. Each case was
  confirmed 1:1 vs @bids/validator 2.4.1 on the identical tree.
- Add crates/bids-core/tests/common/mod.rs: shared build_ctx/infer_datatype
  fixture builder (the older integration tests predate it; this avoids a
  fourth copy).
- Promote VOLUME_COUNT_MISMATCH, BVAL_MULTIPLE_ROWS, BVEC_NUMBER_ROWS,
  DWI_MISSING_BVAL, DWI_MISSING_BVEC to RUST_CODES. ISSUE_COVERAGE.md:
  53 -> 58 implemented, 125 -> 120 missing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same crafted-fixture method as the DWI family. All six were fixture-only:
the generic engine, the magnitude/magnitude1/bval associations, and the
NIfTI header context already implement them. Each case confirmed 1:1 vs
@bids/validator 2.4.1 on the identical synthetic tree.

- Add crates/bids-core/tests/fmap_checks.rs (uses tests/common builder).
- Promote FIELDMAP_WITHOUT_MAGNITUDE_FILE, MISSING_MAGNITUDE1_FILE,
  ECHOTIME1_2_DIFFERENCE_UNREASONABLE, MAGNITUDE_FILE_WITH_TOO_MANY_DIMENSIONS,
  EPI_WITH_BVALS_NEEDS_SMALL_BVALS, TOTAL_READOUT_TIME_MUST_DEFINE to
  RUST_CODES. ISSUE_COVERAGE.md: 58 -> 64 implemented, 120 -> 114 missing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…4 codes)

Same method. All fixture-only — the parsed NIfTI header context already
implements them. Each confirmed 1:1 vs @bids/validator 2.4.1.

- Add crates/bids-core/tests/nifti_checks.rs.
- Promote T1W_FILE_WITH_TOO_MANY_DIMENSIONS, BOLD_NOT_4D, NIFTI_DIMENSION,
  NIFTI_PIXDIM to RUST_CODES. ISSUE_COVERAGE.md: 64 -> 68 implemented,
  114 -> 110 missing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Acting on the external review of PR #3 (crafted-fixture work):

- tsv.rs: strip the UTF-8 BOM in parse_tsv_body_with_headers too, so
  BOM-prefixed .tsv.gz bodies match Deno (the earlier fix only covered
  parse_tsv). Added a regression test.
- run.py: move machine-specific / known-red datasets (ds000003, ds002606,
  ds005016) into an opt-in OPT_IN_DATASETS dict. The no-argument sweep now
  runs only the expected-green set and is a clean pass/fail signal; opt-in
  datasets run when named. Fixed the pre-existing empty-set literal nit.
- ISSUE_COVERAGE.md: SUBJECT_FOLDERS was in RUST_CODES but listed missing —
  moved to implemented; counts corrected to 69 implemented / 109 missing.
  Documented the dataset-parity vs crafted-fixture evidence distinction.
- tests/common/mod.rs: host the shared minimal NIfTI header builder (was
  triplicated across dwi/fmap/nifti checks); note that build_ctx leaves
  dataset_modalities/subjects empty.
- audit_response.md / CLAUDE.md updated; audit_temp.md removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@neurolabusc neurolabusc merged commit 30458fc into main Jun 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant